Program for Tuesday, May 16th

PROGRAM FOR TUESDAY, MAY 16TH

Days:

08:45-09:30 Session 12: Keynote Speaker 2. Qian Depei.

Abstract

After a brief review on HPC research and development under China’s high-tech R&D program in the past years, this talk will introduce the new key project on high performance computing in the national key R&D Program of China in the 13th 5-year plan. The major challenges and technical issues in developing the exa-scale system will be discussed. The goal and the major activities, as well as the current status, of the new key project will be presented.

Chair:

Manish Parashar (State University of New Jersey University, USA)

Location: Barrio de Salamanca

09:30-10:30 Session 13: Best Student Paper Award

Chair:

Javier Garcia Blas (Carlos III University, Spain)

Location: Barrio de Salamanca

09:30

Terry Penner (Texas State University, USA)
Mina Guirguis (Texas State University, USA)

Combating the Bandits in the Cloud: A Moving Target Defense Approach

ABSTRACT. Security and privacy in cloud computing are critical components for various organizations that depend on the cloud in their daily operations. Customers' data and the organizations' proprietary information have been subject to various attacks in the past. In this paper, we develop a set of Moving Target Defense (MTD) strategies that randomize the location of the Virtual Machines (VMs) to harden the cloud against a class of Multi-Armed Bandit (MAB) policy based attacks. These attack policies capture the behavior of adversaries that seek to explore the allocation of VMs in the cloud and exploit the ones that provide the highest rewards (e.g., critical datasets, credit card transactions, etc). We assess through simulation experiments the performance of our MTD strategies, showing that they can make MAB policy based attacks no more effective than random attack policies. Additionally, we show the effects of critical parameters – such as discount factors, the time between randomizing the locations of the VMs and variance in the rewards obtained – on the performance of our defense. We validate our results through simulations and a real OpenStack system implementation in our lab to assess migration times and down times under different system loads.

09:55

Pedro A. R. S. Costa (Faculty of Sciences, University of Lisboa, Portugal)
Fernando Ramos (Faculty of Sciences, University of Lisboa, Portugal)
Miguel Correia (INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Portugal)

Chrysaor: Fine-Grained, Fault-Tolerant Cloud-of-Clouds MapReduce

ABSTRACT. MapReduce is a framework for processing large data sets much used in the context of cloud computing. MapReduce implementations like Hadoop can tolerate crashes and file corruptions, but not arbitrary faults. Unfortunately, there is evidence that arbitrary faults do occur and can affect the correctness of MapReduce job executions. Furthermore, many outages of major cloud offerings have been reported, raising concerns about the dependence on a single cloud.

In this paper we propose a novel execution system that allows to scale out MapReduce computations to a cloud-of-clouds, and tolerate arbitrary faults, malicious faults, and cloud outages. Our system, Chrysaor, is based on a fine-grained replication scheme that tolerates faults at the task level. Our solution has three important properties: it tolerates the above-mentioned classes of faults at reasonable cost; it requires minimal modifications to the users' applications; and it does not involve changes to the Hadoop source code. We performed an extensive evaluation of our system in Amazon EC2, showing that our fine-grained solution is more efficient in terms of computation by recovering only faulty tasks. This is achieved without incurring a significant penalty for the baseline case (i.e., without faults) in most workloads.

10:30-11:00Coffee Break

11:00-12:30 Session 14A: Performance Modeling and Evaluation (II)

Chair:

Young Choon Lee (Macquarie University, Australia)

Location: Velazquez

11:00	Fulya Kaplan (Boston University, USA) Ozan Tuncer (Boston University, USA) Vitus Leung (Sandia National Laboratories, USA) Scott Hemmert (Sandia National Laboratories, USA) Ayse Coskun (Boston University, USA) Unveiling the Interplay Between Global Link Arrangements and Network Management Algorithms on Dragonfly Networks ABSTRACT. Network messaging delay historically constitutes a large portion of the wall-clock time for High Performance Computing (HPC) applications, as these applications run on many nodes and involve intensive communication among their tasks. Dragonfly network topology has emerged as a promising solution for building exascale HPC systems owing to its low network diameter and large bisection bandwidth. Dragonfly includes local links that form groups and global links that connect these groups via high bandwidth optical links. Many aspects of the dragonfly network design are yet to be explored, such as the performance impact of the connectivity of the global links, i.e., global link arrangements, the bandwidth of the local and global links, or the job allocation algorithm. This paper first introduces a packet-level simulation framework to model the performance of HPC applications in detail. The proposed framework is able to simulate known MPI routines as well as applications with custom-defined communication patterns for a given job placement algorithm and network topology. Using this simulation framework, we investigate the coupling between global link bandwidth and arrangements, communication pattern and intensity, job allocation and task mapping algorithms, and routing mechanisms in dragonfly topologies. We demonstrate that by choosing the right combination of system settings and workload allocation algorithms, communication overhead can be decreased by up to 44%. We also show that circulant arrangement provides up to 15% higher bi-section bandwidth compared to the other arrangements; but for realistic workloads, the impact of link arrangements is less than 3%.
11:20	James Phung (The University of Sydney, Australia) Young Choon Lee (Macquarie University, Australia) Albert Y. Zomaya (The University of Sydney, Australia) Application-Agnostic Power Monitoring in Virtualized Environments ABSTRACT. Many servers use technologies such as virtualization or containerization to improve server utilization. These technologies pose challenges for power monitoring since it is not possible to directly measure the power use of an abstraction such as a virtual machine. Much work has been done in modeling the power use of CPUs, virtual machines and entire servers; however, there is a scarcity of work in building lightweight power monitoring middleware that can be deployed across a range of systems. In this paper, we present cWatts+ as a prototype lightweight software-based virtual power meter. Utilizing a simple but powerful application-agnostic power model, it offers comparable performance to existing “more complex and heavier-weight” power models. It uses a small number of widely available CPU event counters and the Performance Application Programming Interface Library to estimate power usage on a per-thread basis. It has minimal overhead and is portable across a variety of systems. It can be used in containerized or virtualized environments. We evaluate the estimation performance of cWatts+ for a variety of real-world benchmarks that are relevant to large distributed systems. Also, we examine the importance of including CPU core temperature data in the power model. We demonstrate that our power model has an average error of less than 5%. This result compares favorably with existing state-of-the-art power models and is achieved using a relatively simple power model that exhibits minimal power consumption (overhead). Consequently, our power monitoring middleware is viable for use in real-world applications such as power estimation for energy-aware scheduling.
11:40	Nikela Papadopoulou (National Technical University of Athens, Greece) Lena Oden (Argonne National Laboratory, USA) Pavan Balaji (Argonne National Laboratory, USA) A performance study of UCX over InfiniBand ABSTRACT. To achieve scalability of applications on millions of processes, communication software for large-scale systems needs to provide the appropriate functionality, while fulfilling multiple requirements, as performance, scalability and portability. UCX is a novel open-source communication middleware with a two-level API design addressing these needs. The lower level interface, UCT, adds minimal overhead to data transfer but requires a lot of effort from the user. The higher level interface, UCP, is easier to use, but, as a tradeoff, adds some overhead to the communication. As MPICH will deploy UCX as a middleware, this work focuses on charting the performance of UCX over InfiniBand. We analyze performance shortcomings that stem from the two-level design and the sources of these performance losses. In particular, we target basic functions of UCP, evaluate their performance over InfiniBand and analyze sources of overheads compared to UCT and Verbs. We propose and evaluate some fixes to minimize these overheads, in order to enhance UCP performance and scalability.
12:00	Alexandros Evangelidis (University of Birmingham, UK) David Parker (University of Birmingham, UK) Rami Bahsoon (University of Birmingham, UK) Performance Modelling and Verification of Cloud-based Auto-Scaling Policies ABSTRACT. Auto-scaling, a key property of cloud computing, allows application owners to acquire and release resources on demand. However, the shared environment, along with the exponentially large configuration space of available parameters, makes the configuration of auto-scaling policies a challenging task. In particular, it is difficult to quantify, a priori, the impact of a policy on Quality of Service (QoS) provision. To address this problem, we propose a novel approach based on performance modelling and formal verification to produce performance guarantees on particular rule-based auto-scaling policies. We demonstrate the usefulness and efficiency of our model through a detailed validation process on the Amazon EC2 cloud, using two types of load patterns. Our experimental results, show that it can be proven very effective in helping the cloud application owner configure an auto-scaling policy, in order to minimise the QoS violations.

11:00-12:30 Session 14B: Storage and I/O (II)

Chair:

Francisco Rodrigo Duro (University Carlos III of Madrid, Spain)

Location: Goya

11:00	Pengfei Xuan (Clemson University, USA) Feng Luo (Clemson University, USA) Rong Ge (Clemson University, USA) Pradip Srimani (Clemson University, USA) Dynamic Management of In-memory Storage for Efficiently Integrating Compute- and Data-intensive Computing on HPC Systems ABSTRACT. In order to boost the performance of data-intensive computing on HPC systems, in-memory computing frameworks, such as Apache Spark and Flink, use local DRAM for data storage. Optimizing the memory allocation to data storage is critical to delivering performance to traditional HPC compute jobs and throughput to data-intensive applications sharing the HPC resources. Current practices that statically configure in-memory storage may leave inadequate space for compute jobs or lose the opportunity to utilize more available space for data-intensive applications. In this paper, we explore techniques to dynamically adjust in-memory storage and make the right amount of space for compute jobs. We have developed a dynamic memory controller, DynIMS, which infers memory demands of compute tasks online and employs a feedback-based control model to adapt the capacity of in-memory storage. We test DynIMS using mixed HPCC and Spark workloads on a HPC cluster. Experimental results show that DynIMS can achieve up to 5X performance improvement compared to systems with static memory allocations.
11:30	Junjie Qian (University of Nebraska Lincoln, USA) Hong Jiang (University of Texas Arlington, USA) Witawas Srisa-An (University of Nebraska Lincoln, USA) Sharad Seth (University of Nebraska Lincoln, USA) Stan Skelton (NetApp Inc, USA) Joseph Moore (NetApp Inc., USA) Energy-efficient I/O Thread Schedulers for NVMe SSDs on NUMA ABSTRACT. Non-volatile memory express (NVMe) based SSDs and the NUMA platform are widely adopted in servers to achieve faster storage speed and more powerful processing capability. As of now, very little research has been conducted to investigate the performance and energy efficiency of the state-of-the-art NUMA architecture integrated with NVMe SSDs, an emerging technology used to host parallel I/O threads. As this technology continues to be widely developed and adopted, we need to understand the runtime behaviors of such systems in order to design software runtime systems that deliver optimal performance while consuming only the necessary amount of energy. This paper characterizes the runtime behaviors of a Linux-based NUMA system employing multiple NVMe SSDs. Our comprehensive performance and energy-efficiency study using massive numbers of parallel I/O threads shows that the penalty due to CPU contention is much smaller than that due to remote access of NVMe SSDs. Based on this insight, we develop a dynamic ``lesser evil'' algorithm called {\sc ESN}, to minimize the impact of these two types of penalties. {\sc ESN} is an energy-efficient profiling-based I/O thread scheduler for managing I/O threads accessing NVMe SSDs on NUMA systems. Our empirical evaluation shows that ESN can achieve optimal I/O throughput and latency while consuming up to 50\% less energy and using fewer CPUs.
12:00	Moustafa Abdelbaky (Rutgers University, USA) Javier Diaz-Montes (Rutgers University, USA) Merve Unuvar (IBM Research, USA) Melissa Romanus (Rutgers University, USA) Malgorzata Steinder (IBM Research, USA) Manish Parashar (Rutgers University, USA) Enabling Distributed Software-Defined Environments Using Dynamic Infrastructure Service Composition ABSTRACT. Service-based access models coupled with emerging application deployment technologies are enabling opportunities for realizing highly customized software-defined environments that can support dynamic and data-driven applications. However, these approaches require rethinking traditional resource composition models to support dynamic composition that can adapt to evolving application needs and the state of resources. In this paper, we present a programmable approach to dynamic service composition that leverages Constraint Programming for resource description and implements a software-defined approach to space-time resource composition. The resulting software-defined environment continually adapts to meet objectives/constraints set by the users, applications, and/or resource providers. We present the design and prototype implementation of such software-defined service composition. Our prototype leverages Docker containers to package applications and facilitate their deployment across distributed infrastructures. We use a cancer informatics workflow to demonstrate the operation of our approach using resources from five different cloud providers that are aggregated on-demand based on user and resource provider constraints.

11:00-12:30 Session 14C: Industrial Session

Chair:

Javier Garcia Blas (Carlos III University, Spain)

Location: Latina

11:00	Javier Sanchez Rojas (Mellanox, Spain) Interconnect Your Future: Paving the Road to Exascale
11:45	Harald Odendahl (HO Computer, Germany) Intel Parallel Studio XE - a powerful set of compilers and tools (not only) for HPC and a short outlook to Intel's ideas of AI

11:00-12:30 Session 14D: Doctoral Symposium (I)

Chair:

Harold Castro (Communications and Information Technology Group (COMIT), Department of Systems and Computer Engineering, Universidad de Los Andes, Bogotá, Colombia, Colombia)

Location: Serrano

11:00	Silvina Caíno-Lores (University Carlos III of Madrid, Spain) Florin Isaila (University Carlos III of Madrid, Spain) Jesús Carretero (University Carlos III of Madrid, Spain) Data-Aware Support for Hybrid HPC and Big Data Applications ABSTRACT. Nowadays there is a raising interest in bridging the gap between Big Data application models and data-intensive HPC. This work explores the effects that Big Data-inspired paradigms could have in current scientific applications through the evaluation of a real-world application from the hydrology domain. This evaluation led to experience that portrayed the key aspects of the HPC and Big Data paradigms that made them successful in their respective worlds. With this information, we established a research roadmap to build a platform suitable for HPC hybrid applications, with a focus on efficient data management and fault-tolerance.
11:20	Carlos André Batista De Carvalho (Federal University of Piaui, Brazil) Miguel Franklin De Castro (Federal University of Ceara, Brazil) Rossana Maria De Castro Andrade (Federal University of Ceara, Brazil) Secure cloud storage service for detection of security violations ABSTRACT. A cloud storage service implements security mechanisms to protect users’ data. Moreover, due to the loss of control over the cloud infrastructure, it is essential to demonstrate its security, increasing the trust and transparency in cloud services. However, an analysis of the literature reveals flaws in existing solutions that ignore the customers’ needs and do not identify all attacks. Then, a secure storage service for cloud computing is proposed to address these issues, combining security mechanisms. The proposed solution includes auditing and monitoring mechanisms to detect and prove violations of security properties, and Colored Petri Nets (CPNs) are used for evaluation. As results, the attacks are detected and the provider cannot deny the identified violations.
11:40	Van Long Tran (Institut Mines-Télécom -- Télécom SudParis, France) Éric Renault (Institut Mines-Télécom -- Télécom SudParis, France) Viet Hai Ha (Hue University of Eduction, Viet Nam) Optimization of checkpoints and execution model for an implementation of OpenMP on distributed memory architectures ABSTRACT. CAPE (Checkpointing-Aide Parallel Execution) is an approach tried to port OpenMP programs on distributed memory architectures like Cluster, Grid or Cloud systems. It provides a set of prototypes and functions to translate automatically and execute OpenMP program on distributed memory systems based on checkpointing techniques. This solution have shown that it have achieved high performance and complete compatibility with OpenMP. However, it is in research and development stage, so there are many functions need to be added, some techniques and models need to be improved. This paper presents approaches and techniques that have been applied and will be applied to optimize checkpoints and execution model of CAPE.
12:00	Shashank Shekhar (Vanderbilt University, USA) Aniruddha Gokhale (Vanderbilt University, USA) Dynamic Resource Management Across Cloud-Edge Resources for Performance-Sensitive Applications ABSTRACT. A large number of modern applications and systems are cloud-hosted, however, limitations in performance assurances from the cloud, and the longer and often unpredictable endto- end network latencies between the end user and the cloud can be detrimental to the response time requirements of the applications, specifically those that have stringent Quality of Service (QoS) requirements. Although edge resources, such as cloudlets, may alleviate some of the latency concerns, there is a general lack of mechanisms that can dynamically manage resources across the cloud-edge spectrum. To address these gaps, this research proposes Dynamic Data Driven Cloud and Edge Systems (D3CES), which uses measurement data collected from adaptively instrumenting the cloud and edge resources to learn and enhance models of the distributed resource pool, and in turn using these models in a feedback loop to make effective resource management decisions to host applications and deliver their QoS properties. D3CES is being evaluated in the context of a variety of cyber physical systems, such as smart city, online games, and augmented reality applications.

12:30-14:00Lunch Break

14:00-16:00 Session 15A: Scheduling and Resource Management (III)

Chair:

Vladimir Vlassov (Royal Institute of Technology (KTH), School for Information and Communication Technology, Sweden)

Location: Serrano

14:00	Sevil Dräxler (Paderborn University, Germany) Holger Karl (Paderborn Universitry, Germany) Zoltán Ádám Mann (University of Duisburg-Essen, Germany) Joint Optimization of Scaling and Placement of Virtual Network Services ABSTRACT. Management of complex network services requires flexible and efficient service provisioning as well as optimized handling of continuous changes in the workload of the service. To adapt to changes in the demand, service components need to be replicated (scaling) and allocated to physical resources (placement) dynamically. In this paper, we propose a fully automated approach to the joint optimization problem of scaling and placement, enabling quick reaction to changes. We formalize the problem, analyze its complexity, and develop two algorithms to solve it. Extensive empirical results show the applicability and effectiveness of the proposed approach.
14:20	Julien Herrmann (Georgia Institute of Technology, USA) Jonathan Kho (Georgia Institute of Technology, USA) Bora Ucar (CNRS and LIP ENS Lyon, France) Kamer Kaya (Sabancı University, Turkey) Umit Catalyurek (Georgia Institute of Technology, USA) Acyclic Partitioning of Large-Scale Directed Acyclic Graphs ABSTRACT. Finding a good partition of a computational directed acyclic graph associated with an algorithm can help to find an execution pattern improving data locality, conducting an analysis of data movement, and exposing parallel steps. The partition is required to be acyclic, i.e., the inter-part edges between the vertices from different parts should preserve an acyclic dependency structure among the parts. In this work, we adopt the multilevel approach with coarsening, initial partitioning, and refinement phases for acyclic partitioning of directed acyclic graphs and develop a direct k-way partitioning scheme and a recursive-bisection-based one. To the best of our knowledge, no such scheme exists in the literature. To ensure the acyclicity of the partition at all times, we propose novel and efficient coarsening and refinement heuristics. The quality of the computed acyclic partitions is assessed by computing the edge cut, the total volume of communication between components and the critical path latencies. We use the solution returned by well-known undirected graph partitioners as a baseline to evaluate our acyclic partitioner, knowing that the space of solution is more restricted in our problem. The experiments are run on graphs arising from linear algebra applications.
14:40	Pierre-Francois Dutot (Université Grenoble Alpes, France) Yiannis Georgiou (ATOS/BULL, France) David Glesser (Bull Atos Technologies, France) Laurent Lefevre (INRIA, France) Millian Poquet (Univ. Grenoble, MOAIS, LIG, Inria, France) Issam Rais (INRIA, France) Towards Energy Budget Control in HPC ABSTRACT. Energy consumption has become one of the most critical issues in the evolution of High Performance Computing systems (HPC). Controlling the energy consumption of HPC platforms is not only a way to control the cost but also a step forward on the road towards exaflops. Powercapping is a widely studied technique that guarantees that the platform will not exceed a certain power threshold instantaneously but it gives no flexibility to adapt job scheduling to a longer term energy budget control. We propose a job scheduling mechanism that extends the backfilling algorithm to become energy-aware. Simultaneously, we adapt resource management with a node shutdown technique to minimize energy consumption whenever needed. This combination enables an efficient energy consumption budget control on a cluster during a period of time. The technique is experimented, validated and compared with various alternatives through extensive simulations. Experimentation results show high system utilization and limited bounded slowdown along with interesting outcomes in energy efficiency while respecting an energy budget during a particular time period.
15:00	Markus Lumpe (Swinburne University of Technology, Australia) Mohan Baruwal Chhetri (Swinburne University of Technology, Australia) Bao Vo (Swinburne University of Technology, Australia) Ryszard Kowalczyk (Swinburne University of Technology, Australia) On Estimating Minimum Bids for Amazon EC2 Spot Instances ABSTRACT. Consumers can realize significant cost savings by procuring resources from computational spot markets such as Amazon Elastic Compute Cloud (EC2) Spot Instances. They can take advantage of the price differentials across time slots, regions, and instance types to minimize the total cost of running their applications on the cloud. However, spot markets are inherently volatile and dynamic, as a consequence of which spot prices change continuously. As such, prospective bidders can benefit from intelligent insights into the spot market dynamics that can help them make more informed bidding decisions. To enable this, we propose a descriptive statistics approach for the analysis of Amazon EC2 Spot markets to detect typical pricing patterns including the presence of seasonal components, extremes and trends. We use three statistical measures -- the Gini coefficient, the Theil index, and the exponential weighted moving average. We also devise a model for estimating minimum bids such that the Spot instances will run for specified durations with a probability greater than a set value based on different look back periods. Experimental results show that our estimation yields on average a bidding strategy that can reliably secure an instance at least 80% of the time at minimum target guarantee between 50% and 95%.
15:20	Mennan Selimi (Universitat Politècnica de Catalunya, Spain) Llorenç Cerdà-Alabern (UPC, Spain) Marc Sanchez Artigas (Universitat Rovira i Virgili, Spain) Felix Freitag (UPC, Spain) Luis Veiga (IST-INESC ID, Spain) Practical Service Placement Approach for Microservices Architecture ABSTRACT. Community networks (CNs) have gained momentum in the last few years with the increasing number of spontaneously deployed WiFi hotspots and home networks. These networks, owned and managed by volunteers, offer various services to their members and to the public. To reduce the complexity of service deployment, community micro-clouds have recently emerged as a promising enabler for the delivery of cloud services to community users. By putting services closer to consumers, micro-clouds pursue not only a better service performance, but also a low entry barrier for the deployment of mainstream Internet services within the CN. Unfortunately, the provisioning of the services is not so simple. Due to the large and irregular topology, high software and hardware diversity of CNs, it requires of a "careful" placement of micro-clouds and services over the network. To achieve this, this paper proposes to leverage state information about the network to inform service placement decisions, and to do so through a fast heuristic algorithm, which is vital to quickly react to changing conditions. To evaluate its performance, we compare our heuristic with one based on random placement in Guifi.net, the biggest CN worldwide. Our experimental results show that our heuristic consistently outperforms random placement by 211% in terms of bandwidth gain. We quantify the benefits of our heuristic on a real live video-streaming service, and demonstrate that video chunk losses decrease significantly, attaining a 37% decrease in the loss packet rate. Further, using a popular Web 2.0 service, we demonstrate that the client response times decrease up to an order of magnitude when using our heuristic.

14:00-16:00 Session 15B: Security, Privacy and Reliability (I)

Chair:

Leonardo Arturo Bautista Gomez (Barcelona Supercomputing Center, Spain)

Location: Velazquez

14:00	Deepak Tosh (Norfolk State University, USA) Sachin Shetty (Old Dominion University, USA) Xueping Liang (Tennessee State University, USA) Charles Kamhoua (AFRL, USA) Kevin Kwiat (AFRL, USA) Laurent Njilla (AFRL, USA) Security Implications of Blockchain Cloud with Analysis of Block Withholding Attack ABSTRACT. The blockchain technology has emerged as an attractive solution to address performance and security issues in distributed systems. Blockchain's public and distributed peer-to-peer ledger capability benefits cloud computing services which require functions such as, assured data provenance, auditing, management of digital assets, and distributed consensus. Blockchain's underlying consensus mechanism allows to build a tamper-proof environment, where transactions on any digital assets are verified by set of authentic participants or miners. With use of strong cryptographic methods, blocks of transactions are chained together to enable immutability on the records. However, achieving consensus demands computational power from the miners in exchange of handsome reward. Therefore, greedy miners always try to exploit the system by augmenting their mining power. In this paper, we first discuss blockchain's capability in providing assured data provenance in cloud and present vulnerabilities in blockchain cloud. We model the block withholding (BWH) attack in a blockchain cloud considering distinct pool reward mechanisms. BWH attack provides rogue miner ample resources in the blockchain cloud for disrupting honest miners' mining efforts, which was verified through simulations.
14:25	Xueping Liang (Tennessee State University, USA) Sachin Shetty (Old Dominion University, USA) Deepak Tosh (Tennessee State University, USA) Charles Kamhoua (Cyber Assurance Branch, Air Force Research Laboratory, Rome, NY, USA) Kevin Kwiat (Cyber Assurance Branch, Air Force Research Laboratory, Rome, NY, USA) Laurent Njilla (Cyber Assurance Branch, Air Force Research Laboratory, Rome, NY, USA) ProvChain: A Blockchain-based Data Provenance Architecture in Cloud Environment with Enhanced Privacy and Availability ABSTRACT. Cloud data provenance is metadata that records the history of the creation and operations performed on a cloud data object. Secure data provenance is crucial for data accountability, forensics and privacy. In this paper, we propose a decentralized and trusted cloud data provenance architecture using blockchain technology. Blockchain-based data provenance can provide tamper-proof records, enables the transparency of data accountability in the cloud, and helps to enhance the privacy and availability of the provenance data. We make use of the cloud storage scenario and choose the cloud file as a data unit to detect user operations and collect provenance data. We design and implement ProvChain, an architecture to collect and verify cloud data provenance, by embedding the provenance data into blockchain transactions. ProvChain operates mainly in three phases: (1) provenance data collection, (2) provenance data storage and (3) provenance data validation. Performance evaluation results show that ProvChain provides security features including tamper-proof provenance, user privacy and reliability with a relatively low overhead for the cloud storage application.
14:50	Lina Jia (Institute of Information Engineering, China) Min Zhu (Institute of Information Engineering, China) Bibo Tu (Institute of Information Engineering, China) T-VMI: Trusted Virtual Machine Introspection in Cloud Environments ABSTRACT. Nowadays, the vulnerability of cloud environment exposed in security places Virtual Machine Introspection(VMI) at risk: once attackers subvert any layers of cloud environment, such as host, virtual machine manager(VMM) or qemu, VMI will be exposed undoubtedly to those attackers too. Nearly all existing VMI techniques implicitly assume that both VMM by which VMI accesses specific VM data and host which VMI is running on, are nonmalicious and immutable. Unfortunately, this assumption can be potentially violated with the growing shortage of security in cloud environment. Once VMM or host is exploited, attackers can tamper the code or hijack the data of VMI, then, falsify VM information and certifications to Cloud systems administrators who try to make sure the security of specific VM in certain compute node. This paper proposed a new trusted VMI monitor frame: T-VMI, which can avoid the malicious subversion of the routine of VMI. T-VMI guarantees the integrity of VMI code using isolation and the correctness of VMI data using high privilege level instruction and appropriate trap mechanism. This model is evaluated on a simulation environment by using ARM Foundation Model 8.0 and has been presented on a real development ARMv8 JUNO-r0 board. We finished the comprehensive experiments include effectiveness and performance, and the result and analysis show T-VMI has achieved the aim of expected effectiveness with acceptable performance cost.
15:15	Manel Mrabet (ENSI, Tunisia) Yosra Ben Saied (ENSI, Tunisia) Leila Azouz Saidane (ENSI, Tunisia) Modeling correlation between QoS attributes for trust computation in cloud computing environments ABSTRACT. Trust and reputation systems help cloud users improve cloud provider selection by designating only reliable ones. However, there is a lack of effective and robust trust management system TMS in cloud computing. In this paper, we provide an extensive analysis of the existing approaches for establishing trust, examining their main characteristics and limitations. We recognize from this analysis three major categories the existing studies are geared around, namely bootstrapping trust, trust computing and robustness of trust systems. We discuss each of these categories. Hence, we propose a set of guidelines to be considered for the design of more effective and robust TMS, fitting the requirements of cloud environments.
15:40	Grisha Weintraub (The Open University of Israel, Israel) Ehud Gudes (Ben-Gurion University of the Negev, Israel) Crowdsourced Data Integrity Verification for Key-Value Stores in the Cloud ABSTRACT. Thanks to their high availability, scalability, and usability, cloud databases have become one of the dominant cloud services. However, since cloud users do not physically possess their data, data integrity may be at risk. In this paper, we present a novel protocol that utilizes crowdsourcing paradigm to provide practical data integrity assurance in key-value cloud databases. The main advantage of our protocol over previous work is its high applicability - as opposed to existing approaches, our scheme does not require any system changes on the cloud side and thus can be applied directly to any existing system. We demonstrate the feasibility of our scheme by a prototype implementation and its evaluation.

14:00-16:00 Session 15C: Tutorial: Programming Distributed Platforms with PyCOMPSs

Chairs:

Rosa Badía (Barcelona Supercomputing Center, Spain)
Javier Conejero (BSC, Spain)

Location: Latina

16:00-16:30Coffee Break

16:30-18:00 Session 16A: Security, Privacy and Reliability (II)

Chair:

Kewei Sha (University of Houston - Clear Lake, USA)

Location: Serrano

16:30	Rashid Tahir (University of Illinois Urbana-Champaign, USA) Ali Raza (Lahore University of Management Sciences, Pakistan) Mazhar Naqvi (Lahore University of Management Sciences, Pakistan) Fareed Zaffar (Lahore University of Management Sciences, Pakistan) Matthew Caesar (University of Illinois at Urbana-Champaign, USA) An Anomaly Detection Fabric for Clouds Based on Collaborative VM Communities ABSTRACT. The vast attack surface of clouds presents a challenge in deploying scalable and effective defenses. Traditional security mechanisms, which work from "inside" the VM fail to provide strong protection as attackers can bypass them easily. The only available option is to provide security from the layer below the VM i.e., the hypervisor. Previous works that attempt to secure VMs from "outside" either incur substantial space or compute overheads making them slow and impractical or require modifications to the OS or the application codebase. To address these issues, we propose an anomaly detection fabric for clouds based on system call monitoring, which compresses the stream of system calls at their source making the system scalable and near real-time. Our system requires no modifications to the guest OS or the application making it ideal for the data center setting. Additionally, for robust and early detection of threats, we leverage the notion of VM/container communities that share information about attacks in their early stages to provide immunity to the entire deployment. We make certain aspects of the system flexible so that vendors can tune metrics to offer customized protection to clients based on their workload types. Detailed evaluation on a prototype implementation on KVM substantiates our claims.
16:55	Sheng Di (Argonne National Laboratory, USA) Rinku Gupta (Argonne National Laboratory, USA) Marc Snir (University of Illinois Urbana-Champaign, USA) Eric Pershey (Argonne National Laboratory, USA) Franck Cappello (Argonne National Laboratory, USA) LOGAIDER: A tool for mining potential correlations of HPC log events ABSTRACT. Each of today's large-scale supercomputers is producing a large amount of log data. It is crucial to explore various potential correlations of fatal events for understanding their causality and improving the working efficiency for system administrators. To this end, we developed a useful toolkit, named LogAider, which can reveal three types of potential correlations: (1) across-field correlation, (2) spatial correlation, and (3) temporal correlation. Across-field correlation refers to the statistical correlation across fields within a log or across multiple logs based on probabilistic analysis. For analyzing the spatial correlation of events, we developed a generic, easy-to-use visualizer that can view any events queried by users on a system machine graph. LogAider can also effectively mine the spatial correlations by an optimized K-meaning clustering algorithm over a Torus network topology. It is also able to disclose the temporal correlations (or error propagations) over a certain period inside a log or across multiple logs, based on an effective similarity analysis strategy. We assessed the LogAider using one-year Reliability-Availability-Serviceability (RAS) log of MIRA system (one of the world's most powerful supercomputers), as well as its job log. We find that LogAider is very helpful for revealing the potential correlations of system fatal events and job events, with an accurate mining of across-field correlation with both precision and recall of 99.9-100%, as well as precise detection of temporal-correlation with a high similarity (up to 95%) to the ground-truth.
17:20	Omer Subasi (Technical University of Catalonia/Barcelona Supercomputing Center, Spain) Gulay Yalcin (Abdullah Gul University, Turkey) Ferad Zyulkyarov (Barcelona Supercomputing Center, Spain) Osman Unsal (Barcelona Supercomputing Center, Spain) Jesus Labarta (Barcelona Supercomputing Center, Spain) Designing and Modelling Selective Replication for Fault-tolerant HPC Applications ABSTRACT. Fail-stop errors and Silent Data Corruptions (SDCs) are two types of errors that threaten the executions of High Performance Computing (HPC) applications. There are studies that address fail-stop errors and studies that address SDCs. However few studies address both types of error together. In this paper we propose a software-based selective process replication technique for HPC applications for both fail-stop errors and SDCs. Since replication of all application processes can be costly in terms of resources, we develop a runtime-based technique for selective process replication. Selective replication provides an opportunity to meet HPC reliability targets while decreasing resource costs. Our technique is low- overhead, automatic and completely transparent to the user. To achieve efficient selective replication, we first develop a formal framework where we model the reliability of HPC programs in general and process replication in particular via a theoretical model based on Markov Chains. We then design and implement a selective replication heuristic which is based on our Markov model estimating the reliability of the applications at runtime. Our heuristic, called Target Rep, selects the processes that would increase the reliability of the application the most while obeying an overall system-wide process replication percentage threshold - this heuristic is useful in systems having limited spare resources for replication. Our results show that it performs close to the optimum solutions. Target Rep heuristic stays within 5% of the optimum solutions with 50% target replication percentage.

16:30-18:00 Session 16B: Programming Models and Runtime Systems (II)

Chair:

José D. García-Sánchez (University Carlos III, Madrid, Spain)

Location: Velazquez

16:30	Huansong Fu (Florida State University, USA) Manjunath Gorentla Venkata (Oak Ridge National Laboratory, USA) Ahana Roy Choudhury (Florida State University, USA) Neena Imam (Oak Ridge National Laboratory, USA) Weikuan Yu (Florida State University, USA) High-Performance Key-Value Store On OpenSHMEM ABSTRACT. Recently, there is an increasing interest to enable fast data analytics by leveraging system capabilities from large-scale high-performance computing (HPC) systems. OpenSHMEM is a popular run-time system on HPC systems that has been used for large-scale compute-intensive scientific applications. In this paper, we propose to leverage OpenSHMEM to design a distributed in-memory key-value store for fast data analytics. Accordingly, we have developed SHMEMCache on top of OpenSHMEM to leverage its symmetric global memory, efficient one-sided communication operations and general portability. We have also evaluated SHMEMCache through extensive experimental studies. Our results show that SHMEMCache has accomplished significant performance improvement on latency and throughput over the original Memcached. Our initial scalability test on the Titan supercomputer has also demonstrated that SHMEMCache can scale to 256 nodes.
16:50	Sangkuen Lee (Oak Ridge National Laboratory, USA) Hyogi Sim (Oak Ridge National Laboratory, USA) Youngjae Kim (Sogang University, South Korea) Sudharshan S. Vazhkudai (Oak Ridge National Laboratory, USA) AnalyzeThat: A Programmable Shared-Memory System for an Array of Processing-In-Memory Devices ABSTRACT. Processing In Memory (PIM), the concept of integrating processing directly with memory has been attracting a lot of attention, since PIM can assist in overcoming the throughput limitation caused by data movement between CPU and memory. The challenge, however, is that it requires the programmers to have a deep understanding of the PIM architecture to maximize the benefits such as data locality and parallel thread execution on multiple PIM devices. In this study, we present AnalyzeThat, a programmable shared-memory system for parallel data processing with PIM devices. Thematic to AnalyzeThat is a rich PIM-aware Data Structure (PADS), which is an encapsulation that integrally ties together the data, the analysis tasks and the runtime needed to interface with the PIM device array. The PADS abstraction provides (i) a sophisticated key-value data container that allows programmers to easily store data on multiple PIMs, (ii) a suite of parallel operations with which users can easily implement data analysis applications, and (iii) a runtime, hidden to programmers, which provides the mechanisms needed to overlay both the data and the tasks on the PIM device array in an intelligent fashion, based on PIM-specific information collected from the hardware. We have developed a PIM emulation framework called AnalyzeThat. Our experimental evaluation with representative data analytics applications suggests that the proposed system can significantly reduce the PIM programming effort without losing its technology benefits.
17:10	Akihiro Tabuchi (Graduate School of Systems and Information Engineering, University of Tsukuba, Japan) Masahiro Nakao (RIKEN Advanced Institute for Computational Science, Japan) Hitoshi Murai (RIKEN, Japan) Taisuke Boku (University of Tsukuba, Japan) Mitsuhisa Sato (RIKEN AICS/ University of Tsukuba, Japan) Implementation and Evaluation of One-sided PGAS Communication in XcalableACC for Accelerated Clusters ABSTRACT. Clusters equipped with accelerators such as GPU and MIC are widely used. For such clusters, programmers write programs for their applications by combining MPI with one of available accelerator programming models. In particular, OpenACC enables programmers to develop their applications relatively easily, but with lower productivity due to complex MPI programming. XcalableACC (XACC) is a new programming model which is an “orthogonal” integration of a Partitioned Global Address Space (PGAS) language XcalableMP (XMP) and OpenACC. While XMP enables a distributed memory programming on both global-view and local-view models, OpenACC allows offloading operations to a set of accelerators. In the local-view model, programmers can describe communication with the coarray features adopted from Fortran 2008, and we extend them to communication between accelerators. We have designed and implemented an XACC compiler for NVIDIA GPU and evaluated its performance and productivity using two benchmarks, Himeno benchmark and NAS Parallel Benchmarks (NPB) CG. The performances of the XACC version of Himeno benchmark and NPB-CG are over 87% and 96% in the local-view model against MPI + OpenACC version, respectively. Moreover using non-blocking communication makes the performance of local-view version over 95% at Himeno benchmark. From the viewpoint of productivity, the local-view model provides an intuitive form of an array assignment statement for communication.
17:30	Olivier Aumage (Inria, LaBRI, France) Julien Bigot (Maison de la Simulation, CEA, CNRS, Univ. Paris-Sud, UVSQ, Université Paris-Saclay, France) Hélène Coullon (Univ. Lyon, Inria, CNRS, ENS de Lyon, Univ. Claude-Bernard Lyon 1, LIP, France) Christian Pérez (Univ. Lyon, Inria, CNRS, ENS de Lyon, Univ. Claude-Bernard Lyon 1, LIP, France) Jérôme Richard (Univ. Lyon, Inria, CNRS, ENS de Lyon, Univ. Claude-Bernard Lyon 1, LIP, France) Combining Both a Component Model and a Task-based Model for HPC Applications: a Feasibility Study on GYSELA ABSTRACT. This paper studies the feasibility of efficiently combining both a software component model and a task-based model. Task based models are known to enable efficient executions on recent HPC computing nodes while component models ease the separation of concerns of application and thus improve their modularity and adaptability. This paper describes a prototype version of the COMET programming model combining concepts of task-based and component models, and a preliminary version of the COMET runtime built on top of StarPU and L2C. Evaluations of the approach have been conducted on a real-world use-case analysis of a sub-part of the production application GYSELA. Results show that the approach is feasible and that it enables easy composition of independent software codes without introducing overheads. Performance results are equivalent to those obtained with a plain OpenMP based implementation.

16:30-18:00 Session 16C: Doctoral Symposium (II)

Chair:

Harold Castro (Communications and Information Technology Group (COMIT), Department of Systems and Computer Engineering, Universidad de Los Andes, Bogotá, Colombia, Colombia)

Location: Serrano

16:30	Moustafa Abdelbaky (Rutgers University, USA) Javier Diaz-Montes (Rutgers University, USA) Manish Parashar (Rutgers University, USA) Towards Distributed Software-Defined Environments ABSTRACT. Service-based access models coupled with recent advances in application deployment technologies are enabling opportunities for realizing highly customized software-defined environments that can achieve new levels of efficiencies and can support emerging dynamic and data-driven applications. However, achieving this vision requires rethinking traditional federation models to support dynamic (and opportunistic) composition of services that can adapt to evolving application needs and the state of resources. The goal of this work is to provide a programmable dynamic service composition framework that uses software-defined environment concepts to drive the process of dynamically federating infrastructure services from multiple providers. The resulting software-defined federation autonomously adapts the federated composition over the application lifecycle to meet objectives and constraints set by the users, applications, and/or resource providers. We present two different approaches for programming resources and controlling the federation process, one that is based on a rule engine and another that leverages constraint programming. The preliminary results demonstrate the framework operation and performance using simulations and real experiments running Docker containers across multiple clouds.
17:00	Truong Thao Nguyen (SOKENDAI (The Graduate University for Advanced Studies), Japan) Michihiro Koibuchi (National Institute of Informatics, Japan) Cable-geometric error-prone approach for low-latency interconnection networks ABSTRACT. Interconnection network is a main concern in the architecture design of highly parallel systems such as high-density data centers and supercomputers that reach millions of endpoints, e.g., 10M cores for Sunway TaihuLight system. As the number of endpoints of such systems has gradually increased to meet the higher computing and storage demand, the interconnection network is required to provide a low-latency and high communication bandwidth, i.e., less than 1-microsecond latency across systems with a link bandwidth is greater than 100GB/s. In the low-latency design context, our primary aim is to provide a novel solution based on the technology-driven approaches: (1) cable-geometric small-world network topology with custom routing, and (2) the use of an FEC(forward error correction)-free error-prone mechanism for a high-speed low-reliable link. Both can present a logarithmic diameter low-radix network with low-latency and high-bandwidth switches, although approximate computing or ABFT (algorithm-based fault tolerance) design is required for parallel applications.
17:30	Carlos Reaño (Technical University of Valencia, Spain) Federico Silla (Technical University of Valencia, Spain) Jose Duato (Universidad Politecnica de Valencia, Spain) Enhancing the rCUDA Remote GPU Virtualization Framework: from a Prototype to a Production Solution ABSTRACT. The use of hardware accelerators to increase the performance of parallel applications is very common nowadays. For a number of reasons, however, the access to local accelerators is not always feasible (e.g., lack of space or cost). It would also be the case that some applications benefit from having access to more accelerators than the physically possible. To address all these concerns, middleware offering access not only to local but also to remote accelerators appeared. This paper presents a high-level summary of a dissertation focused on enhancing one of these middleware, called rCUDA.

16:30-18:00 Session 16D: Tutorial: Programming Distributed Platforms with PyCOMPSs

Chairs:

Rosa Badía (Barcelona Supercomputing Center, Spain)
Javier Conejero (BSC, Spain)

Location: Latina

20:00-23:00 Session : Gala dinner