IISWC 2017 Program

IISWC-2017

Oct 1-3, 2017

Seattle, Washington, USA

Program

Day 1, Oct 1st

8:00-8:45	Breakfast
8:45-12:00	Sunday Morning - Tutorial 1 dist-gem5: Modeling and Simulating a Distributed Computer System Using Multiple Simulation
10:15-10:45	Coffee Break
12:00-1:30	Lunch (on your own)
1:30-5:00	Sunday Afternoon - Tutorial 2 Deep Learning Acceleration on Mobile Platforms
3:00-3:30	Coffee Break

Day 2, Oct 2nd

8:00-8:45	Breakfast
8:45-9:00	Opening & Welcome
9:00-10:00	Keynote Address I by Jason Cong, UCLA
10:00-10:15	Coffee Break
10:15-12:15	Session 1: Datacenters and HPC
12:15-1:30	Lunch
1:30-3:00	Session 2: Memory Systems I
3:00-3:15	Coffee Break
3:15-4:45	Session 3: I/O, Storage and VMs
4:45-5:45	Poster Session and Wine Down

Day 3, Oct 3rd

8:00-8:30	Breakfast
8:30-9:30	Keynote Address II by Derek Chiou, Microsoft
9:30-10:30	Session 4: Tail Latency
10:30-10:45	Coffee Break
10:45-12:15	Session 5: Memory Systems II
12:15-1:30	Lunch
1:30-3:30	Session 6: Mobile Systems and GPUs
3:30-3:45	Coffee Break
3:45-5:45	Session 7: Benchmarks and Soft Errors

Program Details

**Day 1, Oct 1st**
8:00-8:45	Breakfast Room: Willow Room
8:45-12:00	Sunday Morning - Tutorial 1 dist-gem5: Modeling and Simulating a Distributed Computer System Using Multiple Simulation (Slides, Demo 1, Demo 2) Organizers:Nam Sung Kim and Mohammad Alian, University of Illinois, Urbana-Champaign Abstract:The single-thread performance improvement of processors has been sluggish for the past decade as Dennard’s scaling is approaching its fundamental physical limit. Thus, the importance of efficiently running applications on a parallel/distributed computer system has continued to increase and diverse applications based on parallel/distributed computing models such as MapReduce and MPI have thrived. In a parallel/distributed computing system, the complex interplay amongst processor, node, and network architectures strongly affects the performance and power efficiency. In particular, we observe that all the hardware and software aspects of the network, which encompasses interface technology, switch/router capability, link bandwidth, topology, traffic patterns, and protocols, significantly impact the processor and node activities. Therefore, to maximize performance and power efficiency, it is critical to develop various optimization strategies cutting across processor, node, and network architectures, as well as their software stacks, necessitating full-system simulation. However, our community lacks a proper research infrastructure to study the interplay of these subsystems. Facing such a challenge, we have released a gem5-based simulation infrastructure dubbed dist-gem5 to support full-system simulation of a parallel/distributed computer system using multiple simulation host. This tutorial will cover an introduction to dist-gem5 including relevant background knowledge. Room: Willow Room
10:15-10:45	Coffee Break Room: Cedar Foyer
12:00-1:30	Lunch (on your own)
1:30-5:00	Sunday Afternoon - Tutorial 2 Deep Learning Acceleration on Mobile Platforms (Slides) Organizers:Yiran Chen, Duke University Abstract:Although Deep Neural Networks (DNN) are ubiquitously utilized in many applications, it is generally difficult to deploy DNNs on resource-constrained devices, e.g., mobile platforms. In practical use, both testing (inference) phase and sophisticated training (learning) phase are required, calling for efficient testing and training methods with higher accuracy and shorter converging time. In this tutorial, we first introduce DNNs from a historical perspective, and then present some representative techniques to reduce the computation cost of DNN, including network pruning, model compression, low precision design etc. In the last part of the tutorial, we will show some examples to perform and optimize the training and testing of DNN on distributed mobile systems. Room: Willow Room
3:00-3:30	Coffee Break Room: Cedar Foyer

**Day 2, Oct 2nd**
8:00-8:45	Breakfast Room: Cedar Foyer
8:45-9:00	Opening & Welcome Room: Cedar Room
9:00-10:00	Keynote Address I: Characterization and Acceleration for Genomic Sequencing and Analysis Jason Cong Distinguished Chancellor's Professor, UCLA Computer Science Department Director, Center for Customizable Domain-Specific Computing Room: Cedar Room
10:00-10:15	Coffee Break Room: Cedar Foyer
10:15-12:15	Session 1: Datacenters and HPC Session Chair: Amro Awad Room: Cedar Room
10:15-12:15	MeNa: A Memory Navigator for Modern Hardware in Scale-out Environment. Hosein Mohammadi Makrani, Houman Homayoun (George Mason University) Evaluating Energy Storage for a Multitude of Uses in the Datacenter. Narayanan (Penn State), Di Wang (Microsoft) Abdullah-al Mamun Anand Sivasubramaniam, Hosam Fathy (Penn State), Sean James (Microsoft) Co-Locating and Concurrent Fine-Tuning MapReduce Applications on Microservers for Energy Efficiency Maria Malik, Houman Homayoun (George Mason University) AutoMatch: An Automated Framework for Relative Performance Estimation and Workload Distribution on Heterogeneous HPC Systems Ahmed E. Helal, Wu-chun Feng, Changhee Jung, Yasser Y. Hanafy (Virginia Tech)
12:15-1:30	Lunch Room: Pacific Dining Room
1:30-3:00	Session 2: Memory Systems I Session Chair: Eric Chung Room: Cedar Room
1:30-3:00	Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference. Shin-Ying Lee, Carole-Jean Wu (Arizona State University) A Graphics Tracing Framework for Exploring CPU+GPU Memory Systems. Andreas Sembrant, Trevor E. Carlsson, Erik Hagersten, David Black-Schaffer (Uppsala University) Demystifying the Characteristics of 3D-Stacked Memories: A Case Study for Hybrid Memory Cube Ramyad Hadidi, Bahar Asgari, Burhan Ahmad Mudassar, Saibal Mukhopadhyay, Sudhakar Yalamanchili, Hyesoon Kim (Georgia Tech)
3:00-3:15	Coffee Break Room: Cedar Foyer
3:15-4:45	Session 3: I/O, Storage and VMs Session Chair: Jieming Yin Room: Cedar Room
3:15-4:45	Understanding System Characteristics of Online Erasure Coding on Scalable, Distributed and Large-Scale SSD Array Systems. Sungjoon Koh, Jie Zhang, Miryeong Kwon, Jungyeon Yoon (SK Telecom), David Donofrio (Lawrence Berkeley National Laboratory), Nam Sung Kim (UIUC), Myoungsoo Jung (Yonsei University) TraceTracker: Hardware/Software Co-Evaluation for Large-Scale I/O Workload Reconstruction. Miryeong Kwon, Jie Zhang, Gyuyoung Park, Wonil Choi (Penn State), David Donofrio, John Shalf (Lawrence Berkeley Lab), Mahmut Kandemir (Penn State) Myoungsoo Jung (Yonsei University) Cross-Layer Workload Characterization of Meta-Tracing JIT VMs. Berkin Ilbeyi (Cornell University), Carl Friedrich Bolz-Tereick (Heinrich-Heine-Universität Düsseldorf), Christopher Batten (Cornell University)
4:45-5:45	Poster Session and Wine Down Room: Cedar Foyer (Entrance)
4:45-5:45	Analyzing Graphics Workloads on Tile-based GPUs. German Ceballos, Andreas Sembrant, Trevor E. Carlson, David Black-Schaffer (Uppsala University) Understanding Power-performance Relationship of Energy-efficient Modern DRAM Devices. for Chip Multiprocessor Workloads. Sukhan Lee, Yuhwan Ro (Seoul National University), Young Hoon Son, Hyunyoon Cho (Samsung Electronics) Nam Sung Kim (UIUC) Jung Ho Ahn (Seoul National University) Memory Requirements of Hadoop, Spark, and MPI Based Big Data Applications on Commodity Server Class Architecture. Hosein Mohammadi Makrani, Houman Homayoun (George Mason University) Fine-Grained Energy Profiling for deep Convolutional Neural Network on the Jetson TX1. Crefeda Faviola Rodrigues, Graham Riley, Mikel Lujan (The University of Manchester) Approximeter: Automatically Finding and Quantifying Code Sections for Approximation. Riad Akram, Abdullah Muzahid (University of Texas at San Antonio) Determining Work Partitioning on Closely Coupled Heterogeneous Computing Systems Using Statistical Design of Experiments. Yectli A. Huerta, Brent Swartz, David J. Lilja (University of Minnesota) A Framework for Fast and Fair Evaluation of Automata Processing Hardware. Xiaodong Yu, Kaixi Hou, Hao Wang, Wu-chun Feng (Virginia Tech) Understanding the Thermal Challenges of High-Performance Mobile Devices with a Detailed Platform Temperature Model. Ying-Ju Yu, Carole-Jean Wu (Arizona State University)

**Day 3, Oct 3rd**
8:00-8:30	Breakfast Room: Cedar Foyer
8:30-9:30	Keynote Address II: The Microsoft Catapult Project Derek Chiou, Microsoft Room: Cedar Room
9:30-10:30	Session 4: Tail Latency Session Chair: Changhee Jung Room: Cedar Room
9:30-10:30	Workload Characterization of Interactive Cloud Services on Big and Small Server Platforms. Shuang Chen, Shay Galon, Christina Delimitrou (Cornell University) Srilatha Manne (Cavium Inc.) José F. Martínez (Cornell University) Why Do Programs Have Heavy Tails? Hiroshi Sasaki, Fang-Hsiang Su (Columbia University) Teruo Tanimoto (Kyushu University) Simha Sethumadhavan (Columbia University)
10:30-10:45	Coffee Break Room: Cedar Foyer
10:45-12:15	Session 5: Memory Systems II Session Chair: Andrew Putnam Room: Cedar Room
10:45-12:15	Congestion-Aware Memory Management on NUMA Platforms: A VMware ESXi Case Study. Jagadish Kotra (Penn State) Seongbeom Kim (Google) Kamesh Madduri, Mahmut T. Kandemir (Penn State) Work as a Team or Individual: Characterizing System-level Impacts of Main Memory Partitioning. Jung Ho Ahn, Daejin Jung, Eojin Lee, Jongwook Chung, Sukhan Lee (Seoul National University), Sheng Li (Intel) Exploring the Impact of Memory Block Permutation on Performance of a Crossbar ReRAM Main Memory. Morteza Ramezani, Nima Elyasi (Penn State) Mohammad Arjomand (Georgia Tech) Mahmut T. Kandemir, Anand Sivasubramaniam (Penn State)
12:15-1:30	Lunch Room: Pacific Dining Room
1:30-3:30	Session 6: Mobile Systems and GPUs Session Chair: Jieming Yin Room: Cedar Room
1:30-3:30	Exploring Computation-Communication Tradeoffs in Camera Systems. Amrita Mazumdar, Thierry Moreau, Sung Kim, Armin Alaghi, Luis Ceze, Mark Oskin, Visvesh Sathe (University of Washington) Characterizing Diverse Handheld Apps for Customized Hardware Acceleration. Prasanna Venkatesh Rengasamy, Haibo Zhang, Nachiappan Chidhambaram Nachiappan, Shulin Zhao, Anand Sivasubramaniam, Mahmut Kandemir, Chita R Das (Penn State) Moka: Model-based Concurrent Kernel Analysis. Leiming Yu, Xun Gong, Yifan Sun, Qianqian Fang (Northeastern University) Norm Rubin (NVIDIA Research) David Kaeli (Northeastern University) Understanding the Performance-Accuracy Tradeoffs of Floating-Point Arithmetic on GPUs. Sruthikesh Surineni, Huyen Nguyen (University of Missouri), Ruidong Gu, Michela Becchi (North Carolina State University)
3:30-3:45	Coffee Break Room: Cedar Foyer
3:45-5:45	Session 7: Benchmarks and Soft Errors Session Chair: Michael Papamichael Room: Cedar Room
3:45-5:45	LORE: A Loop Repository for the Evaluation of Compilers. Zhi Chen (UC Irvine) Zhangxiaowen Gong, Justin Josef Szaday (UIUC), David C. Wong (Intel), David Padua (UIUC), Alexandru Nicolau, Alexander V Veidenbaum, Neftali Watkinson (UC Irvine), Zehra Sura (IBM Research) Saeed Maleki (Microsoft Research), Josep Torrellas, Gerald DeJong (UIUC) FLiT: Cross-Platform Floating-Point Result-Consistency Tester and Workload. Geof Sawaya, Michael Bentley, Ian Briggs, Ganesh Gopalakrishnan (University of Utah), Dong H. Ahn (Lawrence Livermore National Laboratory) HeteroSync: A Benchmark Suite for Fine-Grained Synchronization on Tightly Coupled GPUs. Matthew D. Sinclair, Johnathan Alsop, Sarita V. Adve (UIUC) Characterizing The Impact of Soft Errors Across Microarchitectural Structures and Implications for Predictability. Bagus Wibowo, Abhinav Agrawal, James Tuck (North Carolina State University)