Program for Wednesday, May 17th

PROGRAM FOR WEDNESDAY, MAY 17TH

Days:

09:00-10:00 Session 17: Keynote Speaker 3. Willem Deconinck.

Abstract

The algorithms underlying numerical weather prediction (NWP) and climate models that have been developed in the past few decades face an increasing challenge caused by the paradigm shift imposed by hardware vendors towards more energy-efficient devices. In order to provide a sustainable path to Exascale High Performance Computing (HPC), applications become increasingly restricted by energy consumption. As a result, the emerging diverse and complex hardware solutions have a large impact on the programming models traditionally used in NWP software, triggering a rethink of design choices for future massively parallel software frameworks. To this end, the ECMWF is leading the ESCAPE project, a European funded project involving regional NWP centres, Universities, HPC centres and hardware vendors. The aim is to combine interdisciplinary expertise for defining and co-designing the necessary steps towards affordable, Exascale high-performance simulations of weather and climate.

Chair:

Jesus Carretero (Universidad Carlos III de Madrid, Spain)

Location: Barrio de Salamanca

10:00-10:30Coffee Break

10:30-12:30 Session 18A: Scheduling and Resource Management (IV)

Chair:

Sachin Shetty (Tennessee State University, USA)

Location: Goya

10:30	Vineetha Kondameedi (NVIDIA Graphics Private Ltd, Pune, India, India) Sathish Vadhiyar (Indian Institute of Science, India) Adaptive Hybrid Queue Configuration for Supercomputer Systems ABSTRACT. Supercomputers have batch queues to which parallel jobs with specific requirements are submitted. Commercial schedulers come with various configurable parameters for the queues which can be adjusted based on the requirements of the system. The employed configuration affects both system utilization and job response times. Often times, choosing an optimal configuration with good performance is not straight forward and requires good knowledge of the system behavior to various kinds of workloads. In this paper, we have proposed a dynamic scheme for setting queue configurations, namely, the number of queues, partitioning of the processor space and the mapping of the queues to the processor partitions, and the processor size and execution time limits corresponding to the queues based on the historical workload patterns. We use a novel non-linear programming formulation for partitioning and mapping of nodes to the queues for homogeneous HPC systems. We also propose a novel hybrid partitioned-nonpartitioned scheme for allocating processors to the jobs submitted to the queues. Our simulation results for a supercomputer system with 35,000+ CPU cores show that our hybrid scheme gives up to 74\% reduction in queue waiting times and up to 12\% higher utilizations than static queue configurations.
11:00	Francesco Pace (Eurecom, France) Daniele Venzano (Eurecom, France) Damiano Carra (University of Verona, Italy) Pietro Michiardi (Eurecom, France) Flexible Scheduling of Distributed Analytic Applications ABSTRACT. This work addresses the problem of scheduling user-defined analytic applications, which we define as high-level compositions of frameworks, their components, and the logic necessary to carry out work. The key idea in our application definition, is to distinguish classes of components, including rigid and elastic types: the first being required for an application to make progress, the latter contributing to reduced execution times. We show that the problem of scheduling such applications poses new challenges, which existing approaches address inefficiently. Thus, we present the design and evaluation of a novel, flexible heuristic to schedule analytic applications, that aims at high system responsiveness, by allocating resources efficiently. Our algorithm is evaluated using trace-driven simulations, with large-scale real system traces: our flexible scheduler outperforms a baseline approach across a variety of metrics, including application turnaround times, and resource allocation efficiency. We also present the design and evaluation of a full-fledged system, which we have called Zoe, that incorporates the ideas presented in this paper, and report concrete improvements in terms of efficiency and performance, with respect to prior generations of our system.
11:30	Omer Adam (The University of Sydney, Australia) Young Choon Lee (Macquarie University, Australia) Albert Y. Zomaya (The University of Sydney, Australia) CtrlCloud: Performance-Aware Adaptive Control for Shared Resources in Clouds ABSTRACT. Consolidating applications of conflicting service level objectives (SLOs) to share virtualized resources in cloud datacenters requires efficient resource management to ensure overall high Quality-of-Service (QoS). Applications of different performance targets often exhibit different resource demands. Thus, it is not trivial to translate individual application SLOs to corresponding resource shares in a shared virtualized environment to meet performance targets. In this paper, we present CtrlCloud, a performance-aware resource controlling system, that adaptively allocates resources, with a resource-share controller and an allocation optimization model. The controller automatically adapts resource demands based on performance deviations, while the optimization model resolves conflicts in resource demands from multiple co-located applications based on their ongoing performance achieved. We implement a proof-of-concept prototype of CtrlCloud in Python on top of Xen hypervisor. Our experimental results indicate that CtrlCloud can optimize allocations of CPU resources across multiple applications to maintain the 95th percentile latency within predefined SLO targets. CtrlCloud also provides QoS differentiation and yet fulfilling of CPU share demands from applications is maximized given resource availability. We further compare CtrlCloud against two other resource allocation methods commonly used in current clouds. CtrlCloud improves resource utilization by allocating resource shares optimal to ‘actual needs’ as it employs share-performance online modeling.
12:00	Felipe Rodrigo de Souza (Santa Catarina State University, Brazil) Charles Christian Miers (Santa Catarina State University, Brazil) Adriano Fiorese (Santa Catarina State University, Brazil) Guilherme Koslovski (Santa Catarina State University, Brazil) QoS-Aware Virtual Infrastructures Allocation on SDN-based Clouds ABSTRACT. Virtualization of computing and communication infrastructures were disseminated as possible solutions for networks evolution and deployment of new services on cloud data centers. Although promising, their effective application faces obstacles mainly caused by rigidity on the management of communication resources. Currently, the Software-Defined Networks (SDN) paradigm has been popularizing customization and flexibility in network management due to separation of control and data plans. However, benefits introduced by SDN are not trivially applied to Virtual Infrastructures (VIs) provisioning on SDN-based cloud providers. An allocation mechanism needs joint information of control and data plans in order to deliver Quality-of-Service (QoS)-aware mappings while achieving provider objectives. In this work, we formulate the online VI allocation on SDN-based cloud data centers as a Mixed Integer Program (MIP). Following, integer constraints are relaxed to obtain a linear program, and rounding techniques are applied. The mechanism performs VI allocation considering latency, bandwidth, and virtual machine requirements. The results indicate that the VIs mean internal latency can be reduced while simultaneously enforcing other QoS constraints.

10:30-12:30 Session 18B: Performance Modeling and Evaluation (III)

Chair:

Florin Isaila (University Carlos III of Madrid, Spain)

Location: Velazquez

10:30	Shigeru Imai (Rensselaer Polytechnic Institute, USA) Stacy Patterson (Rensselaer Polytechnic Institute, USA) Carlos A. Varela (Rensselaer Polytecnic Institute, USA) Maximum Sustainable Throughput Prediction for Data Stream Processing over Public Clouds ABSTRACT. Increasing demand for real-time data processing has led to a proliferation of scalable stream processing systems such as Storm, Flink, Samza, and Spark Streaming. A significant challenge for these streaming systems is that incoming data rates can highly fluctuate, thereby imposing computational demands on applications that cannot be known in advance and that require resources to be adjusted dynamically to be able to satisfy service level agreements (SLAs) in a cost-effective manner. Therefore, these streaming systems can take advantage of the promised elasticity of cloud computing only if we can accurately estimate their performance on different virtual machine configurations. Since performance can be hindered because of many different reasons---e.g., VMs running out of memory, load imbalances, a congested shared network, or not having enough CPU power---, it is challenging to estimate the minimum number of VMs that will sustain an expected throughput for a given incoming data rate in a framework- and application-agnostic manner. In this paper, we identify a common data processing environment used by modern stream processing systems and propose a framework-agnostic maximum sustainable throughput prediction model. To minimize the time and cost for model training, we statistically determine a set of training samples that use data from running an application on only a few VMs (up to 8) yet enable us to predict maximum sustainable throughput for the application running on a larger number of VMs (in our experiments, up to 32). Using four of Intel's Storm benchmarks on Amazon's EC2 public cloud, our experiments show that we can predict maximum sustainable throughput for streaming applications with less than 9% average prediction error for 16 VMs and less than 32% for 24 VMs. Further, we evaluate our prediction models with simulation-based elastic VM scheduling on a realistic workload. These simulation results show that with 10% over-provisioning, our proposed model’s cost efficiency is comparable to the optimal scaling policy without incurring in any SLA violations.
10:55	Maxime Colmant (ADEME / University of Lille / Inria, France) Pascal Felber (University of Neuchatel, Switzerland) Romain Rouvoy (University of Lille / Inria / IUF, France) Lionel Seinturier (University of Lille / Inria / IUF, France) WattsKit: Software-Defined Power Monitoring of Distributed Systems ABSTRACT. The design and the deployment of energy-efficient distributed systems is a challenging task, which requires software engineers to consider all the layers of a system, from hard- ware to software. In particular, monitoring and analyzing the power consumption of a distributed system spanning several— potentially heterogeneous—nodes becomes particularly tedious when aiming at a finer granularity than observing the power consumption of hosting nodes. While the state-of-the-art in software-defined power meters fails to deliver adaptive solutions to offer such service-level perspective and to cope with the diversity of hardware CPU architectures, this paper proposes to automatically learn the power models of the nodes supporting a distributed system, and then to use these inferred power models to better understand how the power consumption of the system’s processes is distributed across nodes at runtime. Our solution, named WattsKit, offers a modular toolkit to build software-defined power meters à la carte, thus dealing with the diversity of user and hardware requirements. Beyond the demonstrated capability of covering a wide diversity of CPU architectures with high accuracy, we illustrate the benefits of adopting software-defined power meters to analyze the power consumption of complex layered and distributed systems, such as a software stack composed of Docker SWARM, WEAVE, ELASTICSEARCH, and Apache ZOOKEEPER. Thanks to WattsKit, developers and administrators can identify potential power leaks in their software infrastructure.
11:20	Giovanni Mariani (IBM Research, Netherlands) Andreea Anghel (IBM Research - Zurich, Switzerland) Rik Jongerius (IBM Research, Netherlands) Gero Dittmann (IBM Research - Zurich, Switzerland) Predicting cloud performance for HPC applications: a user-oriented approach ABSTRACT. Cloud computing enables end users to execute high-performance computing applications by renting the required computing power. This pay-for-use approach enables small enterprises and startups to run HPC-related businesses with a significant saving in capital investment and a short time to market. When deploying an application onto the cloud, users may a) miss to understand the interactions of the application with the software layers implementing the cloud system, b) be unaware of some hardware details of the cloud system, and c) miss to understand how sharing part of the cloud system with other users might degrade application performance. These misunderstandings may lead the user to select suboptimal cloud configurations in terms of costs or performance or both. To aid users in selecting the optimal cloud configuration for their applications, we suggest that the cloud provider generate a prediction model for the provided system. We propose to apply machine-learning techniques to generate this prediction model. First, the cloud provider profiles a set of training applications by means of a hardware-independent profiler and then executes these applications on a set of training cloud configurations to collect actual performance values. The prediction model is trained to learn the dependencies of actual performance data on the application profile and cloud configuration parameters. The advantage of using a hardware-independent profiler is that the cloud users and the cloud provider can analyze applications on different machines and interface with the same prediction model. We validate the proposed methodology for a cloud system implemented with OpenStack. We apply the prediction model to the NAS parallel benchmarks. The resulting relative error is below 15% and the Pareto optimal cloud configurations finally found when maximizing application speed and minimizing execution cost on the prediction model are at most 15% far away from the actual optimal solutions.
11:45	Christian Davatz (University of Zurich, Switzerland) Christian Inzinger (University of Zurich, Switzerland) Joel Scheuner (University of Zurich, Switzerland) Philipp Leitner (University Of Zurich, Switzerland) An Approach and Case Study of Cloud Instance Type Selection for Multi-Tier Web Applications ABSTRACT. A challenging problem for users of Infrastructure- as-a-Service (IaaS) clouds is selecting cloud providers, regions, and instance types cost-optimally for a given desired service level. This is because issues, such as hardware heterogeneity, contention, and virtual machine placement can result in consid- erably differing performance across supposedly equivalent cloud resources. Existing research on cloud benchmarking helps, but often the focus is on providing low-level microbenchmarks (e.g., CPU or network speed), which is hard to map to concrete business metrics of enterprise cloud applications, such as request throughput of a multi-tier Web application. In this paper, we propose Okta, a general approach for fairly and comprehensively benchmarking the performance and cost of a multi-tier Web application hosted in an IaaS cloud. We exemplify our approach for a case study based on the two-tier AcmeAir application, which we evaluate for 11 real-life deployment configurations on Amazon EC2 and Google Compute Engine. Our results show that for this application, choosing compute-optimized instance types in the Web layer and small bursting instances for the database tier leads to the overall most cost-effective deployments. This result held true for both cloud providers. The least cost- effective configuration in our study provides only about 67% of throughput per US dollar spent. Our case study can serve as a blueprint for future industrial or academic application benchmarking projects.
12:10	Ronan-Alexandre Cherrueau (Inria, France) Dimitri Pertin (Inria, France) Anthony Simonet (Inria, France) Adrien Lebre (Inria, France) Matthieu Simonin (Inria, France) ENOS: a Holistic Framework for Conducting Scientific Evaluations of OpenStack ABSTRACT. By massively adopting OpenStack for operating small to large private and public clouds, the industry has made it one of the largest running software project. Driven by an incredibly vibrant community, OpenStack has now overgrown the Linux kernel. However, with success comes an increased complexity; facing technical and scientific challenges, developers are in great difficulty when testing the impact of individual changes on the performance of such a large codebase, which will likely slow down the evolution of OpenStack. In the light of the difficulties the OpenStack community is facing, we claim that it is time for our scientific community to join the effort and get involved in the development and the evolution of OpenStack, as it has been once done for Linux. However, diving into complex software such as OpenStack is tedious: reliable tools are necessary to ease the efforts of our community and make science as collaborative as possible. In this spirit, we developed ENOS, an integrated framework that relies on container technologies for deploying and evaluating OpenStack on any testbed. ENOS allows researchers to easily express different configurations, enabling fine-grained investigations of OpenStack services. ENOS collects performance metrics at runtime and stores them for post-mortem analysis and sharing. The relevance of ENOS approach to reproducible research is illustrated by evaluating different OpenStack scenarios on the Grid'5000 testbed.

10:30-12:30 Session 18C: Data Centers and Cyberinfrastructure

Chair:

Alfredo Cuzzocrea (ICAR-CNR and University of Calabria, Italy)

Location: Serrano

10:30	Oussama Soualah (Institut Mines-Telecom, France) Marouen Mechtri (Institut Mines-Telecom, Telecom SudParis, France) Chaima Ghribi (Institut Mines-Telecom, Telecom SudParis, France) Djamal Zeghlache (Télécom SudParis, France) Energy Efficient Algorithm for VNF Placement and Chaining ABSTRACT. addresses energy efficient VNF placement and chaining over NFV enabled infrastructures. The placement and chaining are formulated as a tree search problem to propose a new energy efficient service chain placement algorithm. The proposed approach is an extension of the Monte Carlo Tree Search (MCTS) method to achieve energy savings using physical resource consolidation and by VNF sharing between multiple tenants. Performance evaluation results on an experimental platform show significant reduction in energy consumption of the proposed placement solutions.
11:00	Ewnetu Bayuh Lakew (Umea University, Sweden) Alessandro Vittorio Papadopoulos (Mälardalen University, Sweden) Martina Maggio (Lund University, Sweden) Cristian Klein (Umea University, Sweden) Erik Elmroth (Umeå University and Elastisys, Sweden) KPI-agnostic Control for Fine-Grained Vertical Elasticity ABSTRACT. Applications hosted in the cloud have become indispensable in several contexts, with their performance often being key to business operation and their running costs needing to be minimized. To minimize running costs, most modern virtualization technologies such as Linux Containers, Xen, and KVM offer powerful resource control primitives for individual provisioning -- that enable adding or removing of fraction of cores and/or megabytes of memory with granularities of seconds. Despite the technology being ready, there is a lack of proper techniques for fine-grained resource allocation, because there is an inherent challenge in determining the correct composition of resources an application needs, with varying workload, to ensure deterministic performance. This paper presents a control-based approach for the management of multiple resources, accounting for the resource consumption, together with the application performance, enabling fine-grained vertical elasticity. The control strategy ensures that the application meets the target performance indicators, consuming as less resources as possible. We carried out an extensive set of experiments using different applications -- interactive with response-time requirements, as well as non-interactive with throughput desires -- by varying the workload mixes of each application over time. The results demonstrate that our solution precisely provides guaranteed performance while at the same time avoiding both resource over- and under-provisioning.
11:30	Chunliang Hao (Institute of Software, Chinese Academy of Science, China) Jie Shen (Imperial College London, UK) Celia Chen (University of Souther California, USA) Heng Zhang (Institute of Software, Chinese Academy of Science, China) Yanjun Wu (Institute of Software, Chinese Academy of Science, China) Mingshu Li (Institute of Software, Chinese Academy of Science, China) PCSsampler: Sample based, Private-state Cluster Scheduling ABSTRACT. As a promising alternative to centralized scheduling, sample-based scheduling is especially suitable for high fan-out workloads that contain a large number of interactive jobs. Compared to centralized schedulers, existing sample-based schedulers do not hold a global view of the cluster’s resource status. Instead, the scheduling decisions are made solely based on the status of a small set of randomly sampled workers. Although this simple approach is highly efficient in large clusters, the lack of global knowledge of the cluster can lead to sub-optimal task placement decisions and difficulties in enforcing global scheduling policies. In this paper, we address these challenges in existing sample-based scheduling approaches by allowing the scheduler to maintain an approximate version of the global resource status through caching the worker node’s status extracted from reply messages. More specifically, we introduce the private-cluster-state technique (PCS) for the scheduler to obtain such global information. We show that the scheduler can make better scheduling decisions by utilizing PCS and the scheduler can become more capable in enforcing global scheduling policies. The use of PCS is of low cost since it does not initiate new communication in sample-based scheduling. Our approach is implemented in PSCSampler, a full distribute sample-based scheduler, which gains global knowledge from PCS. Experiment results from both simulation runs and Amazon cluster runs show that compared to Sparrow, PCSsampler can significantly reduce both 50th percentile and 90th percentile runtime. The first-time success rate of PCSsampler in gang scheduling is close to an omniscient centralized scheduler.
12:00	Zoltan Mann (University of Duisburg-Essen, Germany) Andreas Metzger (University of Duisburg-Essen, Germany) Optimized Cloud Deployment of Multi-tenant Software Considering Data Protection Concerns ABSTRACT. Concerns about protecting personal data and intellectual property are major obstacles to the adoption of cloud services. To ensure that a cloud tenant's data cannot be accessed by malicious code from another tenant, critical software components of different tenants are traditionally deployed on separate physical machines. However, such physical separation limits hardware utilization, leading to cost overheads due to inefficient resource usage. Secure hardware enclaves offer mechanisms to protect code and data from potentially malicious code deployed on the same physical machine, thereby offering an alternative to physical separation. We show how secure hardware enclaves can be employed to address data protection concerns of cloud tenants, while optimizing hardware utilization. We provide a model, formalization and experimental evaluation of an efficient algorithmic approach to compute an optimized deployment of software components and virtual machines, taking into account data protection concerns and the availability of secure hardware enclaves. Our experimental results suggest that even if only a small percentage of the physical machines offer secure hardware enclaves, relevant cost savings can be achieved.

12:45-13:00 Session 19: Conference closing

Chairs:

Jesus Carretero (Universidad Carlos III de Madrid, Spain)
Javier Garcia Blas (Carlos III University, Spain)
Manish Parashar (State University of New Jersey University, USA)

Location: Barrio de Salamanca

13:00-14:00Lunch Break