IISWC-2018 September 30 - October 2, 2018 Raleigh, North Carolina, USA |
The combination of growth in compute capabilities and availability of large datasets has led to a
re-birth of deep learning. Deep Neural Networks (DNNs) have become state-of-the-art in a
variety of machine learning tasks spanning domains across vision, speech, and machine
translation. Deep Learning (DL) achieves high accuracy in these tasks at the expense of 100s of
ExaOps of computation; posing significant challenges to efficient large-scale deployment in
both resource-constrained environments and data centers. One of the key enablers to improve
operational efficiency of DNNs is based on the application-level observation that when
extracting deep in-sight from vast quantities of structured and unstructured data the exactness
imposed by traditional computing is not required. Relaxing the "exactness" constraint enables
exploiting opportunities for approximate computing across all
layers of the system stack.
In this talk I will present workload characterization guided optimizations that led to domain-specific multi-TOPs AI hardware accelerator core for acceleration of deep learning training and
inference in systems from edge devices to data centers. In order to derive high sustained
utilization and energy efficiency from such an AI core requires ground-up re-thinking to exploit
approximate computing across the stack including algorithms, architecture, programmability,
and hardware. I will summarize our explorations to exploit approximate computing at different
levels of the stack based on the analysis of the application's tolerance and robustness to
relaxing the exactness required by traditional computing systems.
Viji Srinivasan has been a Research Staff Member at IBM T.J. Watson Research Center in Yorktown Heights from 2001. At IBM, she has worked on various aspects of data management including energy-efficient processor designs, microarchitecture of the memory hierarchies of large-scale servers, cache coherence management of symmetric multiprocessors, integration of new memory technologies into the memory hierarchy, accelerators for data analytics applications and more recently end-to-end accelerator solutions for AI. Many of her research contributions have been incorporated into IBM's Power & System-z Enterprise-class servers.