IISWC 2019 Program

IISWC-2019

November 3 - November 5, 2019

Orlando, Florida, USA

Program

Day 1, November 3rd

8:00-8:45	Breakfast
8:45-12:00	Tutorial 1 (Full day) SST-GPU: An Execution-Driven CUDA Kernel Scheduler and Streaming-Multiprocessor Compute Model	Tutorial 2 (Half day) Proxy Benchmarks for Reproducible Research
10:15-10:45	Coffee Break
12:00-1:30	Lunch (on your own)
1:30-5:00	Tutorial 1 (Full day) SST-GPU: An Execution-Driven CUDA Kernel Scheduler and Streaming-Multiprocessor Compute Model	Tutorial 3 (Half day) Challenges and Solutions for End-to-End and Across Stack ML Benchmarking
3:00-3:30	Coffee Break

Day 2, November 4th

8:00-8:45	Breakfast
8:45-9:00	Opening & Welcome
9:00-10:00	Keynote Address I
10:00-10:15	Coffee Break
10:15-11:55	Session 1: Best Paper Candidates
11:55-1:00	Lunch
1:00-2:40	Session 2: Memory and Storage
2:40-3:00	Coffee Break
3:00-4:30	Session 3: Hot Workloads Special Session
4:45-6:00	Session 4: Short Paper Presentation

Day 3, November 5th

8:00-9:00	Breakfast and Opening
9:00-10:00	Keynote Address II
10:00-10:15	Coffee Break
10:15-11:55	Session 5: Analysis and Optimization
11:55-1:00	Lunch
1:00-2:40	Session 6: AI Workloads
2:40-3:00	Coffee Break
3:00-4:40	Session 7: Benchmarking, Modeling, and Testing
4:40-5:00	Best Paper Awards and Closing

Program Details

**Day 1, November 3rd**
8:00-8:45	Breakfast Room: Sun & Surf II/III/IV
8:45-12:00	Tutorial 1 (Full day) SST and GPGPU-Sim: An Execution-Driven CUDA Kernel Scheduler and Streaming-Multiprocessor Compute Model Organizers: M. Zhang, M. Khairy, T. Rogers, Purdue University; C. Hughes, Sandia National Laboratories Abstract: As components, architectures, and systems become increasingly complex, simulation has taken on a pervasive role in the process of realizing a complex engineering endeavor. Often, simulation is the foremost method of understanding the intricacies of novel high-performance architectures, emerging technologies, and the interconnect topologies while gathering crucial information about energy consumption, network efficiency, and software execution. In parallel with the increased use of simulations, there has been a growth in the number of available models for specific applications, fomenting a growing urgency for interoperability, consistency, and communication between simulation tools and their developers. To ease the use and interoperability of larger simulated systems, a standard communication methodology between models is needed. The tutorial will introduce, explain, and expand on several key concepts. SST will be introduced as a framework for modeling and simulation. After a brief introduction, key concepts such as the simulation environment and module creation will be discussed. An overview of the simulator framework will be presented, followed by an in-depth discussion of its features and their application. This session will also introduce the new integrated GPGPU-Sim component and its many uses. Room: Sun & Surf II/III	Tutorial 2 (Half day) Proxy Benchmarks for Reproducible Research (slide) Organizers: Lizy Kurian John, University of Texas at Austin Abstract: Computer architecture research has largely employed detailed full-system simulation with real-world workloads, however, the very large simulation times taken by this methodology has started to prohibit good design space exploration. Our ongoing research has come up with successful techniques to characterize benchmarks and synthesize or clone benchmarks into miniaturized code sequences with approximately the same performance and power behavior as the original workload. This tutorial will present the proxy generation methodology, proxies for SPEC CPU 2017 benchmarks, and proxies for Cassandra, MongoDB, and MySQL. It will also present SimPoints for SPEC CPU 2017 and their pinballs. The use of miniaturized proxies for reproducible research will be examined. Room: Sun & Surf IV
10:15-10:45	Coffee Break Room: Sun & Surf II/III/IV
12:00-1:30	Lunch (on your own)
1:30-5:00	Tutorial 1 (Full day) SST and GPGPU-Sim: An Execution-Driven CUDA Kernel Scheduler and Streaming-Multiprocessor Compute Model Organizers: M. Zhang, M. Khairy, T. Rogers, Purdue University; C. Hughes, Sandia National Laboratories Abstract: As components, architectures, and systems become increasingly complex, simulation has taken on a pervasive role in the process of realizing a complex engineering endeavor. Often, simulation is the foremost method of understanding the intricacies of novel high-performance architectures, emerging technologies, and the interconnect topologies while gathering crucial information about energy consumption, network efficiency, and software execution. In parallel with the increased use of simulations, there has been a growth in the number of available models for specific applications, fomenting a growing urgency for interoperability, consistency, and communication between simulation tools and their developers. To ease the use and interoperability of larger simulated systems, a standard communication methodology between models is needed. The tutorial will introduce, explain, and expand on several key concepts. SST will be introduced as a framework for modeling and simulation. After a brief introduction, key concepts such as the simulation environment and module creation will be discussed. An overview of the simulator framework will be presented, followed by an in-depth discussion of its features and their application. This session will also introduce the new integrated GPGPU-Sim component and its many uses. Room: Sun & Surf II/III	Tutorial 3 (Half day) Challenges and Solutions for End-to-End and Across Stack ML Benchmarking Organizers: Wen-Mei, Abdul Dakakk, Cheng Li, University of Illinois Urbana-Champaign; Jinjun Xiong, IBM Research Abstract: The current landscape of Machine Learning (ML) and Deep Learning (DL) is rife with non-uniform models, frameworks, and system stacks. It lacks standard tools and methodologies to evaluate and profile models or systems. Due to the absence of standard tools, the state of the practice for evaluating and comparing the benefits of proposed AI innovations (be it hardware or software) on end-to-end AI pipelines is both arduous and error-prone --- stifling the adoption of the innovations in a rapidly moving field. The goal of this tutorial is to discuss these challenges and solutions that will help address issues arising from evaluating ML models. The tutorial will educate audience on both evaluation scenarios and hardware metrics (such as different evaluation load behaviors, power efficiency, and utilization) that should be captured by benchmarking. The tutorial will also educate the attendees on state-of-the-art tools and best practices developed at the IBM-ILLINOIS Center for Cognitive Computing Systems Research (C3SR), some of which has won Best Research Paper Award at well-known international conferences. The tutorial will bring together experts from both industry and academia to discuss how these tools and methodologies can be leveraged for: Effective ML benchmarking across hardware and software stacks, Repeatable and consistent ML benchmarking to characterize the performance of models, frameworks, and hardware, Identifying pitfalls and myths of current ML benchmarking methodologies, Utilizing a model evaluation specification to help model authors and framework developers communicate evaluation parameters with the hardware and system communities. Room: Sun & Surf IV
3:00-3:30	Coffee Break Room: Sun & Surf II/III/IV

**Day 2, November 4th**
8:00-8:45	Breakfast Room: Sun & Surf II/III/IV
8:45-9:00	Opening & Welcome Room: Sun & Surf II/III/IV
9:00-10:00	Keynote Address I: AR/VR Applications: Silicon Challenges and Research Directions Edith Beigne, Facebook Session Chair: Vijay Janapa Reddi Room: Sun & Surf II/III/IV
10:00-10:15	Coffee Break Room: Sun & Surf II/III/IV
10:15-11:55	Session 1: Best Paper Candidates Session Chair: Lizy K. John Room: Sun & Surf II/III/IV
10:15-11:55	Characterizing the Deployment of Deep Neural Networks on Commercial Edge Devices R. Hadidi (Georgia Institute of Technology), J. Cao (Georgia Institute of Technology), Y. Xie (Georgia Institute of Technology), B. Asgari (Georgia Institute of Technology), T. Krishna (Georgia Institute of Technology), H. Kim (Georgia Institute of Technology) One Size Doesn't Fit All: Quantifying Performance Portability of Graph Applications on GPUs T. Sorensen (Princeton University), S. Pai (University of Rochester), A. Donaldson (Imperial College London, Google) SimdHT-Bench: Characterizing SIMD-Aware Hash Table Designs on Emerging CPU Architectures D. Shankar (The Ohio State University), X. Lu (The Ohio State University), D. Panda (The Ohio State University) An Overflow-free Quantized Memory Hierarchy in General-Purpose Processors M. Lenjani (University of Virginia), P. Gonzalez (University of Virginia), E. Sadredini (University of Virginia), M. Rahman (University of Virginia), M. Stan (University of Virginia)
11:55-1:00	Lunch Room: Seminole A
1:00-2:40	Session 2: Memory and Storage Session Chair: Xiaoyi Lu Room: Sun & Surf II/III/IV
1:00-2:40	Trimming the Tail for Deterministic Read Performance in SSD N. Elyasi (Samsung Semiconductor Inc), C. Choi (Samsung Semiconductor Inc), A. Sivasubramaniam (The Pennsylvania State University), J. Yang (Samsung Semiconductor Inc), V. Balakrishnan (Facebook) Evaluation of Non-Volatile Memory based Last Level Cache given Modern Use Case Behavior A. Hankin (Tufts University), T. Shapira (Tufts University), K. Sangaiah (Drexel University), M. Lui (Drexel University), M. Hempstead (Tufts University) Workload-Aware DRAM Error Prediction using Machine Learning L. Mukhanov (Queen's University Belfast), K. Tovletoglou (Queen's University Belfast), D. Nikolopoulos (Queen's University Belfast), H. Vandierendonck (Queen's University Belfast), G. Karakonstantis (Queen's University Belfast) Faster than Flash: An In-Depth Study of System Challenges for Emerging Ultra-Low Latency SSDs S. Koh (KAIST), J. Jang (KAIST), C. Lee (KAIST), M. Kwon (KAIST), J. Zhang (Yonsei University and KAIST), M. Jung (KAIST)
2:40-3:00	Coffee Break Room: Sun & Surf II/III/IV
3:00-4:30	Session 3:Hot Workloads Special Session Session Chair: Vijay Janapa Reddi Room: Sun & Surf II/III/IV
3:00-4:30	Invited Talk 1: MLPerf Design Challenges Peter Mattson, Google Invited Talk 2: At-Scale Infrastructure Challenges for Machine Learning Carole-Jean Wu, Facebook Invited Talk 3: Addressing the Challenges of Supporting At-Scale, Time-Sensitive Deep Learning Inference Workloads Sridhar Lakshmanamurthy, Intel
4:45-6:00	Session 4: Short Paper Presentation Session Chair: Ravi Iyer Room: Sun & Surf II/III/IV
4:45-6:00	HolDCSim: A Holistic Simulator for Data Centers F. Yao (University of Central Florida), K. Nguyen (The George Washington University), S. Dayapule (The George Washington University), B. Lu (University of California, Riverside), J. Wu (The George Washington University), S. Subramaniam (The George Washington University), G. Venkataramani (The George Washington University) Optimizing GPU Cache Policies for ML Workloads J. Alsop (AMD Research), M. Sinclair (Wisconsin, AMD Research), A. Gutierrez (AMD Research), S. Bharadwaj (AMD Research), X. Zhang (AMD Research), B. Beckmann (AMD Research), A. Dutu (AMD Research), O. Kayiran (AMD Research), M. LeBeane (AMD Research), B. Potter (AMD Research), S. Puthoor (AMD Research), T. Yeh (Purdue) Persistent Memory Workload Characterization: A Hardware Perspective X. Liu (UC San Diego), B. Jupudi (Veritas Technologies LLC), P. Mehra (Samsung Electronics), J. Zhao (UC San Diego) Fingerprinting Anomalous Computation with RNN for GPGPU-Based HPC Machines P. Zou (Clemson University), A. Li (Pacific Northwest National Laboratory), K. Barker (Pacific Northwest National Laboratory), R. Ge (Clemson University) Performance-driven Programming of Multi-TFLOP Deep Learning Accelerators S. Venkataramani (IBM TJ Watson Research Center), J. Choi (IBM TJ Watson Research Center), V. Srinivasan (IBM TJ Watson Research Center), K. Gopalakrishnan (IBM TJ Watson Research Center), L. Chang (IBM TJ Watson Research Center) Barrier Synchronization vs. Voltage Noise: A Quantitative Analysis Z. Chowdhury (University of Minnesota, Twin Cities), S. Khatamifard (XNOR.AI), Z. Zheng (Brown University), T. Moreshet (Boston University), R. Bahar (Brown University), U. Karpuzcu (University of Minnesota, Twin Cities) Characterizing Performance/Accuracy Tradeoffs of High Precision Applications via Auto-tuning R. Gu (North Carolina State University), P. Beata (North Carolina State University), M. Becchi (North Carolina State University)

**Day 3, November 5th**
8:00-9:00	Breakfast and Opening Room: Sun & Surf II/III/IV
9:00-10:00	Keynote Address II: Accelerator-level Parallelism (slide) Mark D. Hill, University of Wisconsin-Madison Session Chair: Ravi Iyer Room: Sun & Surf II/III/IV
10:00-10:15	Coffee Break Room: Sun & Surf II/III/IV
10:15-11:55	Session 5: Analysis and Optimization Session Chair: Sreepathi Pai Room: Sun & Surf II/III/IV
10:15-11:55	Optimizing Hyperplane Sweep Operations using Asynchronous Multi-grain GPU Tasks A. Kaushik (AMD Research, University of Waterloo), A. Aji (AMD Research), M. Hassaan (AMD Research), N. Chalmers (AMD Research), N. Wolfe (AMD Research), S. Moe (AMD Research), B. Beckmann (AMD Research), S. Puthoor (AMD Research) Detecting Last-Level Cache Contention in Workload Colocation with Meta Learning H. Shen (Intel Corporation), C. Li (Intel Corporation) Branch Prediction Is Not A Solved Problem: Measurements, Opportunities, and Future Directions C. Lin (Intel Corporation), S. Tarsa (Intel Corporation) SNU-NPB 2019: Parallelizing and Optimizing NPB in OpenCL and CUDA for Modern GPUs Y. Do (Seoul National University), H. Kim (Seoul National University), P. Oh (Seoul National University), D. Park (Seoul National University), J. Lee (Seoul National University)
11:55-1:00	Lunch Room: Sun & Surf II/III/IV
1:00-2:40	Session 6: AI Workloads Session Chair: Divya Mahajan Room: Sun & Surf II/III/IV
1:00-2:40	Characterizing Deep Learning Training Workloads on Alibaba-PAI M. Wang (Alibaba Group), C. Meng (Alibaba Group), G. Long (Alibaba Group), C. Wu (The University of Hong Kong), J. Yang (Alibaba Group), W. Lin (Alibaba Group), Y. Jia (Alibaba Group) Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs V. Radu (University of Edinburgh), K. Kaszyk (University of Edinburgh), Y. Wen (Trinity College Dublin), J. Turner (University of Edinburgh), J. Cano (University of Glasgow), E. Crowley (University of Edinburgh), B. Franke (University of Edinburgh), A. Storkey (University of Edinburgh), M. O'Boyle (University of Edinburgh) Deep Learning Language Modeling Workloads: Where Time Goes on Graphics Processors A. Zadeh (University of Toronto), Z. Poulos (University of Toronto), A. Moshovos (University of Toronto) A Closer Look at Lightweight Graph Reordering P. Faldu (The University of Edinburgh), J. Diamond (Oracle Labs), B. Grot (The University of Edinburgh)
2:40-3:00	Coffee Break Room: Sun & Surf II/III/IV
3:00-4:40	Session 7: Benchmarking, Modeling, and Testing Session Chair: Fan Yao Room: Sun & Surf II/III/IV
3:00-4:40	Autonomous Data-Race-Free GPU Testing T. Ta (Cornell University), X. Zhang (AMD), A. Gutierrez (AMD), B. Beckmann (AMD) BHive: A Benchmark Suite and Measurement Framework for Validating x86-64 Basic Block Performance Models Y. Chen (MIT CSAIL), A. Brahmakshatriya (MIT CSAIL), C. Mendis (MIT CSAIL), A. Renda (MIT CSAIL), E. Atkinson (MIT CSAIL), O. Sykora (MIT CSAIL), S. Amarasinghe (MIT CSAIL), M. Carbin (MIT CSAIL) Efficacy of Statistical Sampling on Contemporary Workloads: The Case of SPEC CPU2017 S. Singh (Ashoka University), M. Awasthi (Ashoka University) Multi-Bit Upsets Vulnerability Analysis of Modern Microprocessors A. Chatzidimitriou (University of Athens), G. Papadimitriou (University of Athens), C. Gavanas (University of Athens), G. Katsoridas (University of Athens), D. Gizopoulos (University of Athens)
4:40-5:00	Best Paper Awards and Closing