Program for Tuesday, October 23rd

PROGRAM FOR TUESDAY, OCTOBER 23RD

Days:

previous day

next day

all days

View: session overview talk overview

08:30-09:00 Session 6: Conference Opening Session

Chair:

Behrooz Shirazi

Location: Salon 3 Grand Ballroom

09:00-10:00 Session 7: Keynote: Energy-Efficient Computing: Datacenters, Mobile Devices, and Mobile Clouds, By: Massoud Pedram

Energy consumption is a key design driver for electronic systems ranging from warehouse-size datacenters to battery-powered mobile devices to mobile clouds. It is well known that energy efficiency is best achieved by an application-specific mix of power-efficient hardware and runtime energy governance. Power efficient hardware requires low power devices, cell libraries, circuits, and architectures whereas effective energy governance needs significant hardware and software support e.g., to achieve dynamic power/performance scaling, power gating, core consolidation, and computation offloading. In my talk I will discuss three example problems to illustrate the range of low power solutions that can be employed and the kind of power savings which are achievable. These problems are: (i) Power-efficient resource management and job scheduling in a geo-distributed cloud infrastructure, (ii) Design of low-power application processors exploiting the temperature effect inversion of deeply scaled devices, and (iii) Energy-efficient computation offloading for deep neural networks in a mobile cloud computing environment. I will conclude my talk with a list of best power-efficient design practices.

Chair:

Behrooz Shirazi

Location: Salon 3 Grand Ballroom

10:30-12:00 Session 8A: Green Methodologies

Chair:

Ali Hurson

Location: Salon 1 Grand Ballroom

10:30	Erik Brunvand, Donald Kline and Alex Jones Dark Silicon Considered Harmful: A Case for Truly Green Computing (Best Paper Nominee) ABSTRACT. As individuals and researchers approach the challenge of green computing it is natural to consider the energy consumption of computational devices and their supporting systems during their use phase (i.e., after they are deployed into service). However, for computing to be truly green, all phases of the system life-cycle, from manufacturing to disposal, must be considered. In particular there is limited awareness to the considerable fraction of the total life-cycle environmental impacts of computing systems that result from the fabrication of the integrated circuits (ICs) that are used in those devices. Ironically, the trend toward dark silicon accelerators, often targeted at improving operational energy efficiency, may be counterproductive for holistic energy reduction of computing systems. The increased chip area that results from a large percentage of dark silicon may exacerbate the fabrication impacts to the point that overall sustainability is actually decreased. In this paper, we explore some properties of manufacturing and operational energy efficiency and make a case that truly green computing must carefully consider the tradeoffs involved.
11:00	Reza Azimi, Chao Jing and Sherief Reda PowerCoord: A Coordinated Power Capping Controller for Multi-CPU/GPU Servers (Best Paper Nominee) ABSTRACT. Modern supercomputers and cloud providers rely on server nodes that are equipped with multiple CPU sockets and general purpose GPUs (GPGPUs) to handle the high demand for intensive computations. These nodes consume much higher power than commodity servers, and integrating them with power capping systems used in modern clusters presents new challenges. In this paper, we propose a new power capping controller, PowerCoord, that is specifically designed for servers with multiple CPU and GPU sockets that are running multiple jobs at a time. PowerCoord coordinates among the various power domains (e.g., CPU sockets and GPUs) inside a node server to meet target power caps, while seeking to maximize throughput. Our approach also takes into consideration job deadlines and priorities. Because performance modeling for co-located jobs is error-prone, PowerCoord uses a learning method. PowerCoord has a number of heuristic policies to allocate power among the various CPUs and GPUs, and it uses reinforcement learning to select the best policy during runtime. Based on the observed state of the system, PowerCoord shifts the distribution of selected policies. We implement our power cap controller on a real multi-CPU/GPU server with low overhead, and we demonstrate that it is able to meet target power caps while maximizing the throughput, and balancing other demands such as priorities and deadlines. Our results show that PowerCoord improves the server throughput compared with prior work POWsched.
11:30	Shikang Xu, Israel Koren and Mani Krishna Energy and Dependability Enhancement by Dynamic Actuator Derating in Cyber-Physical Systems ABSTRACT. The energy consumption of the cyber part of cyber-physical systems has attracted increasing attention in recent years. Increased energy consumption results in increased thermal damage to the processors requiring more frequent replacement; in many applications, it also requires increased energy storage capacity. A major consumer of energy in ultra-reliable CPSs is fault-tolerance since this is implemented using redundant computations. This paper studies the use of dynamic actuator derating (i.e., artificially limiting the maximum actuator output) on reducing the required redundancy. By targeting the use of fault-tolerance, we are able to obtain significant reductions in computer energy expenditure and thermal stress without losing reliability. This has beneficial effects on processor lifetimes and required energy storage.

10:30-12:00 Session 8B: Multicore Systems

Chair:

Charles Suslowicz

Location: Salon 6 Grand Ballroom

10:30	Mohamed Gadou, Tania Banerjee, Meena Arunachalam, Galen Shipman and Sanjay Ranka Multiobjective evaluation and optimization of CMT-bone on Intel Knights Landing ABSTRACT. CMT-bone is a proxy-app for simulating compressible multiphase turbulence. The application uses discretization and numerical methods for solving partial differential equations. Hence, the application is compute intensive as well as memory intensive. Intel Knights landing (KNL) is the second generation MIC architecture from Intel. It delivers massive thread parallelism, data parallelism, and memory bandwidth in a CPU form factor. In this paper, we use Intel KNL to get a performance speedup of 1.8x in CMT-bone after applying different optimization techniques for Intel KNL.
11:00	Anup Das, Domenico Balsamo, Geoff Merrett, Bashir M. Al-Hashimi and Francky Catthoor Graceful Performance Adaption through Hardware-Software Interaction for Autonomous Battery Management of Multicore Smartphones ABSTRACT. Despite advances in multicore smartphone technologies, battery consumption still remains one of customer's least satisfying features. This is because existing energy saving techniques do not consider the electrochemical characteristics of batteries, which causes battery consumption to vary unpredictably, both within and across applications. Additionally, these techniques provide application specific fixed performance degradation in order to reduce energy consumption. Having a performance penalty, even when a battery is fully charged, adds to customer dissatisfaction. We propose a control-based approach for run-time power management of multicore smartphones, which scales the frequency of processing cores in response to the battery consumption, taking into account the electrochemical characteristics of a battery. The objective is to enable graceful performance modulation, which adapts with application and battery availability in a predictable manner, improving quality-of-user-experience. Our control approach is practically demonstrated on embedded Linux running on Cortex A15-based smartphone development platform from nvidia. A thorough validation with mobile and Java workloads demonstrate 2.9x improvement in battery availability compared to state-of-the-art approaches.
11:30	Shervin Hajiamini, Behrooz Shirazi, Aaron Crandall and Hassan Ghasemzadeh A Dynamic Programming Technique for Energy-Efficient Multicore Systems ABSTRACT. Per-core dynamic Voltage and Frequency (V/F) Scaling (DVFS) is a well-known methodology for achieving energy efficiency in multicore systems. With a focus on static (compile-time) methods for V/F level assignments, heuristic DVFS techniques provide fast, suboptimal V/F level assignments while dynamic programming (DP) methods evaluate V/F levels globally and provide near optimal solutions, but at the cost of overhead delays. We propose an efficient DP technique using the Viterbi algorithm, which uses the Energy-Delay Product (EDP) as objective function to predict the best V/F levels, using applications’ profiled information, to minimize energy consumption and execution time. We evaluate and compare the performance of the proposed algorithm against three heuristic methods—a greedy version of our algorithm, a feedback controller method, and a simple history heuristic that uses historical performance to make predictions for adjustments to V/F levels. Experimental results show that our algorithm outperforms the heuristics under the study by an average of 12 to 24% using the EDP performance criteria.

12:00-13:30 Conference Lunch

Location: Salon 5 Grand Ballroom

13:30-15:00 Session 9A: Sustainable Computer Architectures

Chair:

Erik Brunvand

Location: Salon 1 Grand Ballroom

13:30	Seyed Mohammad Seyedzadeh, Alex Jones and Rami Melhem Improving Sustainability Through Disturbance Crosstalk Mitigation in Deeply Scaled Phase-change Memory (Best Paper Nominee) ABSTRACT. Phase change memory (PCM) is a popular emerging technology for next generation systems. PCM provides advantages compared to conventional memories such as DRAM and Flash due to its reduced operational energy, while also providing density advantages over DRAM and performance and endurance advantages over Flash. There are also several opportunities to improve limitations of PCM including increased dynamic energy consumption, particularly of writing, and cell endurance through intelligent encoding. Technology scaling provides additional density benefits for PCM but increases the proximity between cells. Unfortunately, for technologies below 22nm, the heat used during the writing process can “bleed” to neighboring cells and lead to inadvertent writing, referred to a write disturbance. Cells can be disturbed not only in the active wordline (i.e., row), but also in the neighboring wordlines (rows) as a result of this “crosstalk.” Write disturbance results in significant system inefficiencies in both energy and performance, due to checking for disturbed cells and rewriting them to the correct value. In this paper, we develop a multi-tiered compression technique that compresses by a small amount (e.g., 40- or 56-bits of a 512-bit block) in >94% of the cachelines stored into memory (e.g., during eviction) without disturbing data locality vital for optimizing writes into PCM. Using this small recovered space, we design a one-to-one mapping that probabilistically detects the cells with a high likelihood to disturb neighboring cells. Using encoding, correction pointers, and a hybrid approach, we can reduce the incidence of write disturbance. Due to using reclaimed bits for encoding, the proposed technique requires only five (5) additional auxiliary bits per 512-bit cacheline, minimizing the embodied energy (fabrication) overhead to mitigate write disturbance. Our experimental tests shows that the proposed technique successfully reduces the number of disturbed cells, which can be directly translated to the number of extra write and read operations required for disturbance error mitigation. Specifically, our technique improves performance, endurance and write energy by 47%, 42% and 36% versus the leading approach with minimal (<1%) increases to embodied energy.
14:00	M Meraj Ahmed, Naseef Mansoor and Amlan Ganguly An Asymmetric, Energy Efficient One-to-Many Traffic-Aware Wireless Network-in-Package Interconnection Architecture for Multichip Systems ABSTRACT. High Performance Computing (HPC) platforms like blade servers consist of multiple processor chips which may be multicore CPUs, GPUs, memory modules and other subsystems. These high performance and memory intensive multichip systems require efficient support for one-to-many traffic patterns which originates from cache coherency, system-level synchronization mechanisms and other control signals. Small portions of such traffic can introduce congestion which significantly reduce overall performance and cause energy bottleneck unless low latency transmission is ensured by a one-to-many traffic aware interconnection architecture. Traditional metal based Network-on-Chip (NoC) interconnection architecture is not suitable for such traffic as it provides high-latency, power hungry multi-hop paths. To address this issue, we propose the design of a one-to-many traffic-aware Wireless Network-in-Package (WiNiP) architecture by introducing a novel asymmetric wireless interconnection topology and flow control. The proposed asymmetric topology provides low latency communication for one-to-many traffic and increase system bandwidth with lower energy consumption. Through cycle accurate simulator we show that the proposed topology reduce energy by 55.92% and outperforms other interconnection architecture for synthetic as well as application specific traffics
14:30	Yujing Feng, Han Li, Xu Tan, Xiaochun Ye, Dongrui Fan and Zhimin Tang Optimizing network efficiency of dataflow architectures through dynamic packet merging ABSTRACT. Dataflow processor has shown its unique advantages in executing throughput computing applications with its communication-exposed microarchitecture. In dataflow processors, large amounts of data are directly transferred between instructions through a network-on-chip. The throughput of NoC strongly affects the efficiency of data transfer and the performance of dataflow processors. Based on the specific features of the dataflow network, we propose a mechanism for dynamically merging the packets in the routers. The experiment results show that the average latency of data transfer is reduced by 11.8%, the performance of dataflow accelerator is improved by 14.0%.

13:30-15:00 Session 9B: Smart Buildings

Chair:

Mahdi Nikdast

Location: Salon 6 Grand Ballroom

13:30	Daniel Petrov, Rakan Alseghayer, Daniel Mosse and Panos K. Chrysanthis Data-Driven User-Aware HVAC Scheduling ABSTRACT. HVAC (Heat, Ventilation, Air Conditioning) systems account for significant amount of energy spent in residential and commercial buildings. Improved wall and window insulation, energy efficient bulbs as well as building design that facilitates a more optimal usage of the thermally conditioned air within the building, are amongst some of the measures taken to address the high usage of energy for space conditioning. In this paper we address a main issue that affects the energy consumption for heating and cooling of buildings, namely the duty cycle of the furnaces / air-conditioners. We propose a 3-fold scheduling mechanism that builds on multiple variable linear regression model. Our scheduler minimizes the duty cycle and does not impact users' comfort. Our experimental evaluation shows that our proposed approach saves up to 49\% energy, compared to commodity HVAC systems.
14:00	Luca Perilli, Eleonora Franchi Scarselli, Roberto Canegallo and Roberto La Rosa Wake-Up Radio Impact in Self-Sustainability of Sensor and Actuator Wireless Nodes in Smart Home Applications SPEAKER: Luca Perilli ABSTRACT. This work discusses the impact of Wake-Up Radio (WuR) technology to extend the battery life on sensor and actuator nodes in a smart home scenario. The focus is on nodes that harvest energy from light or a temperature gradient and implement DASH7, an open-source low-power protocol supporting both query-response and beaconing communication models. A prototype WuR is used, with a quiescent current less than 1 µA and a sensitivity of -38 dBm compatible with indoor applications. Experimental data show that integrating WuR is not convenient in nodes that need to send a message to the network coordinator periodically, e.g. sensor nodes implementing beaconing communication models. On the contrary, in request-response mode, integrating the WuR, the average actuator current consumption reduces from 35 µA down to 6 µA during a reference period where no data or commands are exchanged between the network coordinator and the node. Thanks to the WuR, we find that an average light intensity of 150 lux throughout daytime and less than 14 min of a temperature gradient of 10°C between the hot and cold side of a thermoelectric generator are sufficient to turn the actuator nodes for water flooding and smart heating control into an energetically autonomous mode.
14:30	Elena Markoska and Sanja Lazarova-Molnar Comparative Evaluation of Threshold Modelling for Smart Buildings’ Performance Testing ABSTRACT. With buildings consuming ca. 40% of the world’s total energy consumption, greater importance is given to their performance and ensuring that they behave as originally intended. The key to timely detection of underperformance is continuous real time measurement of a building’s behaviour. To this end, performance testing has been developed as a practice that compares the observed behaviour and the expected behaviour of a building. Representation of the observed behaviour is obtained by applying specific calculations to meters’ and sensors’ readings. The expected behaviour can be calculated in different ways, depending on the necessity for historical data, or knowledge regarding the physical relationships between the building components. We study and compare these approaches based on the difficulty to develop and use, accuracy in predicting the expected behaviour, as well as their ability to be integrated and run in real-time. The models are additionally compared to the country’s regulations for building energy consumption. The models for simulating the energy consumption of a building are trained and calibrated based on data from a case study smart building located in Denmark. The results show the superiority of the black box model, based on the higher accuracy of the forecasted performance, the lower effort of model generation and simulation, as well as applicability to a variety of buildings.

15:30-17:00 Session 10: Panel 1: Advances in Data Center Energy Optimization

Moderator: Tim Hansen; Panelists:

Berk Celik, University of Technology Belfort-Montbeliard (UTBM), “DATAZERO”
Jie Li, Clarkson University, “Coordinated Operation and Planning of Data Centers as Microgrid”
Timothy M. Hansen, South Dakota State University, “Electricity Market Opportunities for Data Center Virtual Power Plants”

Chair:

Timothy Hansen

Location: Salon 4 Grand Ballroom

19:00-21:00 Student Recognition Conference Banquet (Sponsored by University of Pittsburgh) and Best Paper Award Presentation (Sponsored by Elsevier)

Location: Salon 5 Grand Ballroom