September 30 - October 2, 2018

 Raleigh, North Carolina, USA

Across the Stack Approximate Computing Opportunities for Deep Learning Acceleration

Vijayalakshmi Srinivasan
Research Staff Member at IBM T.J. Watson Research Center in Yorktown Heights

The combination of growth in compute capabilities and availability of large datasets has led to a re-birth of deep learning. Deep Neural Networks (DNNs) have become state-of-the-art in a variety of machine learning tasks spanning domains across vision, speech, and machine translation. Deep Learning (DL) achieves high accuracy in these tasks at the expense of 100s of ExaOps of computation; posing significant challenges to efficient large-scale deployment in both resource-constrained environments and data centers. One of the key enablers to improve operational efficiency of DNNs is based on the application-level observation that when extracting deep in-sight from vast quantities of structured and unstructured data the exactness imposed by traditional computing is not required. Relaxing the "exactness" constraint enables exploiting opportunities for approximate computing across all layers of the system stack.

In this talk I will present workload characterization guided optimizations that led to domain-specific multi-TOPs AI hardware accelerator core for acceleration of deep learning training and inference in systems from edge devices to data centers. In order to derive high sustained utilization and energy efficiency from such an AI core requires ground-up re-thinking to exploit approximate computing across the stack including algorithms, architecture, programmability, and hardware. I will summarize our explorations to exploit approximate computing at different levels of the stack based on the analysis of the application's tolerance and robustness to relaxing the exactness required by traditional computing systems.


Viji Srinivasan has been a Research Staff Member at IBM T.J. Watson Research Center in Yorktown Heights from 2001. At IBM, she has worked on various aspects of data management including energy-efficient processor designs, microarchitecture of the memory hierarchies of large-scale servers, cache coherence management of symmetric multiprocessors, integration of new memory technologies into the memory hierarchy, accelerators for data analytics applications and more recently end-to-end accelerator solutions for AI. Many of her research contributions have been incorporated into IBM's Power & System-z Enterprise-class servers.