IISWC-2019

November 3 - November 5, 2019

  Orlando, Florida, USA


 

Program

 

 

 

Day 1, November 3rd

8:00-8:45 Breakfast
8:45-12:00 Tutorial 1 (Full day)
SST-GPU: An Execution-Driven CUDA Kernel Scheduler and Streaming-Multiprocessor Compute Model
Tutorial 2 (Half day)
Proxy Benchmarks for Reproducible Research
10:15-10:45 Coffee Break
12:00-1:30 Lunch (on your own)
1:30-5:00 Tutorial 1 (Full day)
SST-GPU: An Execution-Driven CUDA Kernel Scheduler and Streaming-Multiprocessor Compute Model
Tutorial 3 (Half day)
Challenges and Solutions for End-to-End and Across Stack ML Benchmarking
3:00-3:30 Coffee Break

 

Day 2, November 4th

8:00-8:45 Breakfast
8:45-9:00 Opening & Welcome
9:00-10:00 Keynote Address I
10:00-10:15 Coffee Break
10:15-11:55 Session 1: Best Paper Candidates
11:55-1:00 Lunch
1:00-2:40 Session 2: Memory and Storage
2:40-3:00 Coffee Break
3:00-4:30 Session 3: Hot Workloads Special Session
4:45-6:00 Session 4: Short Paper Presentation

 

Day 3, November 5th

8:00-9:00 Breakfast and Opening
9:00-10:00 Keynote Address II
10:00-10:15 Coffee Break
10:15-11:55 Session 5: Analysis and Optimization
11:55-1:00 Lunch
1:00-2:40 Session 6: AI Workloads
2:40-3:00 Coffee Break
3:00-4:40 Session 7: Benchmarking, Modeling, and Testing
4:40-5:00 Best Paper Awards and Closing

 

Program Details

 

Day 1, November 3rd

8:00-8:45 Breakfast
Room: Ballroom A
8:45-12:00 Tutorial 1 (Full day)
SST and GPGPU-Sim: An Execution-Driven CUDA Kernel Scheduler and Streaming-Multiprocessor Compute Model
Organizers: M. Zhang, M. Khairy, T. Rogers, Purdue University; C. Hughes, Sandia National Laboratories
Abstract: As components, architectures, and systems become increasingly complex, simulation has taken on a pervasive role in the process of realizing a complex engineering endeavor. Often, simulation is the foremost method of understanding the intricacies of novel high-performance architectures, emerging technologies, and the interconnect topologies while gathering crucial information about energy consumption, network efficiency, and software execution. In parallel with the increased use of simulations, there has been a growth in the number of available models for specific applications, fomenting a growing urgency for interoperability, consistency, and communication between simulation tools and their developers. To ease the use and interoperability of larger simulated systems, a standard communication methodology between models is needed.

The tutorial will introduce, explain, and expand on several key concepts. SST will be introduced as a framework for modeling and simulation. After a brief introduction, key concepts such as the simulation environment and module creation will be discussed. An overview of the simulator framework will be presented, followed by an in-depth discussion of its features and their application. This session will also introduce the new integrated GPGPU-Sim component and its many uses.
Room: Ballroom A
Tutorial 2 (Half day)
Proxy Benchmarks for Reproducible Research
Organizers: Lizy Kurian John, University of Texas at Austin
Abstract: Computer architecture research has largely employed detailed full-system simulation with real-world workloads, however, the very large simulation times taken by this methodology has started to prohibit good design space exploration. Our ongoing research has come up with successful techniques to characterize benchmarks and synthesize or clone benchmarks into miniaturized code sequences with approximately the same performance and power behavior as the original workload. This tutorial will present the proxy generation methodology, proxies for SPEC CPU 2017 benchmarks, and proxies for Cassandra, MongoDB, and MySQL. It will also present SimPoints for SPEC CPU 2017 and their pinballs. The use of miniaturized proxies for reproducible research will be examined.
Room: Ballroom B
10:15-10:45 Coffee Break
Room: Ballroom A
12:00-1:30 Lunch (on your own)
1:30-5:00 Tutorial 1 (Full day)
SST and GPGPU-Sim: An Execution-Driven CUDA Kernel Scheduler and Streaming-Multiprocessor Compute Model
Organizers: M. Zhang, M. Khairy, T. Rogers, Purdue University; C. Hughes, Sandia National Laboratories
Abstract: As components, architectures, and systems become increasingly complex, simulation has taken on a pervasive role in the process of realizing a complex engineering endeavor. Often, simulation is the foremost method of understanding the intricacies of novel high-performance architectures, emerging technologies, and the interconnect topologies while gathering crucial information about energy consumption, network efficiency, and software execution. In parallel with the increased use of simulations, there has been a growth in the number of available models for specific applications, fomenting a growing urgency for interoperability, consistency, and communication between simulation tools and their developers. To ease the use and interoperability of larger simulated systems, a standard communication methodology between models is needed.

The tutorial will introduce, explain, and expand on several key concepts. SST will be introduced as a framework for modeling and simulation. After a brief introduction, key concepts such as the simulation environment and module creation will be discussed. An overview of the simulator framework will be presented, followed by an in-depth discussion of its features and their application. This session will also introduce the new integrated GPGPU-Sim component and its many uses.
Room: Ballroom A
Tutorial 3 (Half day)
Challenges and Solutions for End-to-End and Across Stack ML Benchmarking
Organizers: Wen-Mei, Abdul Dakakk, Cheng Li, University of Illinois Urbana-Champaign; Jinjun Xiong, IBM Research
Abstract: The current landscape of Machine Learning (ML) and Deep Learning (DL) is rife with non-uniform models, frameworks, and system stacks. It lacks standard tools and methodologies to evaluate and profile models or systems. Due to the absence of standard tools, the state of the practice for evaluating and comparing the benefits of proposed AI innovations (be it hardware or software) on end-to-end AI pipelines is both arduous and error-prone --- stifling the adoption of the innovations in a rapidly moving field.

The goal of this tutorial is to discuss these challenges and solutions that will help address issues arising from evaluating ML models. The tutorial will educate audience on both evaluation scenarios and hardware metrics (such as different evaluation load behaviors, power efficiency, and utilization) that should be captured by benchmarking. The tutorial will also educate the attendees on state-of-the-art tools and best practices developed at the IBM-ILLINOIS Center for Cognitive Computing Systems Research (C3SR), some of which has won Best Research Paper Award at well-known international conferences. The tutorial will bring together experts from both industry and academia to discuss how these tools and methodologies can be leveraged for: Effective ML benchmarking across hardware and software stacks, Repeatable and consistent ML benchmarking to characterize the performance of models, frameworks, and hardware, Identifying pitfalls and myths of current ML benchmarking methodologies, Utilizing a model evaluation specification to help model authors and framework developers communicate evaluation parameters with the hardware and system communities.
Room: Ballroom B
3:00-3:30 Coffee Break
Room: Ballroom A

 

Day 2, November 4th

8:00-8:45 Breakfast
Room: Ballroom A
8:45-9:00 Opening & Welcome
Room: Ballroom A
9:00-10:00   Keynote Address I: Accelerator-level Parallelism
Mark D. Hill, University of Wisconsin-Madison
Session Chair: TBD
Room: Ballroom A
10:00-10:15 Coffee Break
Room: Ballroom A
10:15-11:55 Session 1: Best Paper Candidates
Session Chair: TBD
Room: Ballroom A
Characterizing the Deployment of Deep Neural Networks on Commercial Edge Devices
R. Hadidi (Georgia Institute of Technology), J. Cao (Georgia Institute of Technology), Y. Xie (Georgia Institute of Technology), B. Asgari (Georgia Institute of Technology), T. Krishna (Georgia Institute of Technology), H. Kim (Georgia Institute of Technology)

One Size Doesn't Fit All: Quantifying Performance Portability of Graph Applications on GPUs
T. Sorensen (Princeton University), S. Pai (University of Rochester), A. Donaldson (Imperial College London, Google)

SimdHT-Bench: Characterizing SIMD-Aware Hash Table Designs on Emerging CPU Architectures
D. Shankar (The Ohio State University), X. Lu (The Ohio State University), D. Panda (The Ohio State University)

An Overflow-free Quantized Memory Hierarchy in General-Purpose Processors
M. Lenjani (University of Virginia), P. Gonzalez (University of Virginia), E. Sadredini (University of Virginia), M. Rahman (University of Virginia), M. Stan (University of Virginia)

11:55-1:00 Lunch
Room: TBD
1:00-2:40 Session 2: Memory and Storage
Session Chair: TBD
Room: Ballroom A
Trimming the Tail for Deterministic Read Performance in SSD
N. Elyasi (Samsung Semiconductor Inc), C. Choi (Samsung Semiconductor Inc), A. Sivasubramaniam (The Pennsylvania State University), J. Yang (Samsung Semiconductor Inc), V. Balakrishnan (Facebook)

Evaluation of Non-Volatile Memory based Last Level Cache given Modern Use Case Behavior
A. Hankin (Tufts University), T. Shapira (Tufts University), K. Sangaiah (Drexel University), M. Lui (Drexel University), M. Hempstead (Tufts University)

Comprehensive Workload-Aware DRAM Error Prediction using Machine Learning
L. Mukhanov (Queen's University Belfast), K. Tovletoglou (Queen's University Belfast), D. Nikolopoulos (Queen's University Belfast), H. Vandierendonck (Queen's University Belfast), G. Karakonstantis (Queen's University Belfast)

Faster than Flash: An In-Depth Study of System Challenges for Emerging Ultra-Low Latency SSDs
S. Koh (KAIST), J. Jang (KAIST), C. Lee (KAIST), M. Kwon (KAIST), J. Zhang (Yonsei University and KAIST), M. Jung (KAIST)

2:40-3:00 Coffee Break
Room: Ballroom A
3:00-4:30 Session 3:Hot Workloads Special Session
Session Chair: TBD
Room: Ballroom A
Invited Talk 1: At-Scale Infrastructure Challenges for Machine Learning
Carole-Jean Wu, Facebook

Invited Talk 2: MLPerf Design Challenges
Peter Mattson, Google

Invited Talk 3: Addressing the Challenges of Supporting At-Scale, Time-Sensitive Deep Learning Inference Workloads
Sridhar Lakshmanamurthy, Intel

4:45-6:00 Session 4: Short Paper Presentation
Room: The Hub
HolDCSim: A Joint Server-Network Simulator for Data Centers
F. Yao (University of Central Florida), K. Nguyen (The George Washington University), S. Dayapule (The George Washington University), B. Lu (University of California, Riverside), J. Wu (The George Washington University), S. Subramaniam (The George Washington University), G. Venkataramani (The George Washington University)

Optimizing GPU Cache Policies for ML Workloads
J. Alsop (AMD Research), M. Sinclair (Wisconsin, AMD Research), A. Gutierrez (AMD Research), S. Bharadwaj (AMD Research), X. Zhang (AMD Research), B. Beckmann (AMD Research), A. Dutu (AMD Research), O. Kayiran (AMD Research), M. LeBeane (AMD Research), B. Potter (AMD Research), S. Puthoor (AMD Research), T. Yeh (Purdue)

Persistent Memory Workload Characterization: A Hardware Perspective
X. Liu (UC San Diego), B. Jupudi (Veritas Technologies LLC), P. Mehra (Samsung Electronics), J. Zhao (UC San Diego)

Fingerprinting Anomalous Computation with RNN for GPGPU-Based HPC Machines
P. Zou (Clemson University), A. Li (Pacific Northwest National Laboratory), K. Barker (Pacific Northwest National Laboratory), R. Ge (Clemson University)

Performance-driven Programming of Multi-TFLOP Deep Learning Accelerators
S. Venkataramani (IBM TJ Watson Research Center), J. Choi (IBM TJ Watson Research Center), V. Srinivasan (IBM TJ Watson Research Center), K. Gopalakrishnan (IBM TJ Watson Research Center), L. Chang (IBM TJ Watson Research Center)

Barrier Synchronization vs. Voltage Noise: A Quantitative Analysis
Z. Chowdhury (University of Minnesota, Twin Cities), S. Khatamifard (XNOR.AI), Z. Zheng (Brown University), T. Moreshet (Boston University), R. Bahar (Brown University), U. Karpuzcu (University of Minnesota, Twin Cities)

Characterizing Performance/Accuracy Tradeoffs of High Precision Applications via Auto-tuning
R. Gu (North Carolina State University), P. Beata (North Carolina State University), M. Becchi (North Carolina State University)

 

Day 3, November 5th

8:00-9:00 Breakfast and Opening
Room:
9:00-10:00  Keynote Address II: TBD
Session Chair: TBD
Room: Ballroom A
10:00-10:15 Coffee Break
Room: Ballroom A
10:15-11:55 Session 5: Analysis and Optimization
Session Chair: TBD
Room: Ballroom A
Optimizing Hyperplane Sweep Operations using Asynchronous Multi-grain GPU Tasks
A. Kaushik (AMD Research, University of Waterloo), A. Aji (AMD Research), M. Hassaan (AMD Research), N. Chalmers (AMD Research), N. Wolfe (AMD Research), S. Moe (AMD Research), B. Beckmann (AMD Research), S. Puthoor (AMD Research)

Detecting Last-Level Cache Contention in Workload Colocation with Meta Learning
H. Shen (Intel Corporation), C. Li (Intel Corporation)

Branch Prediction Is Not A Solved Problem: Measurements, Opportunities, and Future Directions
C. Lin (Intel Corporation), S. Tarsa (Intel Corporation)

SNU-NPB 2019: Parallelizing and Optimizing NPB in OpenCL and CUDA for Modern GPUs
Y. Do (Seoul National University), H. Kim (Seoul National University), P. Oh (Seoul National University), D. Park (Seoul National University), J. Lee (Seoul National University)

11:55-1:00 Lunch

Room: TBD
1:00-2:40 Session 6: AI Workloads
Session Chair: TBD
Room: Ballroom A
Characterizing Deep Learning Training Workloads on Alibaba-PAI
M. Wang (Alibaba Group), C. Meng (Alibaba Group), G. Long (Alibaba Group), C. Wu (The University of Hong Kong), J. Yang (Alibaba Group), W. Lin (Alibaba Group)

Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs
V. Radu (University of Edinburgh), K. Kaszyk (University of Edinburgh), Y. Wen (Trinity College Dublin), J. Turner (University of Edinburgh), J. Cano (University of Glasgow), E. Crowley (University of Edinburgh), B. Franke (University of Edinburgh), A. Storkey (University of Edinburgh), M. O'Boyle (University of Edinburgh)

Deep Learning Language Modeling Workloads: Where Time Goes on Graphic Processors
A. Zadeh (University of Toronto), Z. Poulos (University of Toronto), A. Moshovos (University of Toronto)

A Closer Look at Lightweight Graph Reordering
P. Faldu (The University of Edinburgh), J. Diamond (Oracle Labs), B. Grot (The University of Edinburgh)

2:40-3:00 Coffee Break
Room: Ballroom A
3:00-4:40 Session 7: Benchmarking, Modeling, and Testing
Session Chair: TBD
Room: Ballroom A
Autonomous Data-Race-Free GPU Testing
T. Ta (Cornell University), X. Zhang (AMD), A. Gutierrez (AMD), B. Beckmann (AMD)

A Benchmark For Validating x86-64 Performance Models
Y. Chen (MIT CSAIL), A. Brahmakshatriya (MIT CSAIL), C. Mendis (MIT CSAIL), A. Renda (MIT CSAIL), E. Atkinson (MIT CSAIL), O. Sykora (MIT CSAIL), S. Amarasinghe (MIT CSAIL), M. Carbin (MIT CSAIL)

Efficacy of Statistical Sampling on Contemporary Workloads: The Case of SPEC CPU2017
S. Singh (Ashoka University), M. Awasthi (Ashoka University)

Multi-Bit Upsets Vulnerability Analysis of Modern Microprocessors
A. Chatzidimitriou (University of Athens), G. Papadimitriou (University of Athens), C. Gavanas (University of Athens), G. Katsoridas (University of Athens), D. Gizopoulos (University of Athens)

4:40-5:00 Best Paper Awards and Closing