View: session overviewtalk overview
09:00 | Towards High Level Synthesis Support in the European Deep Learning Library ABSTRACT. The support of inference processes on FPGA devices is common today. Indeed, FPGAs are a cost-effective and energy-efficient alternative to other architectures such as CPUs or GPUs. With the aim of providing an European toolkit for Deep Learning, the H2020 DeepHealth project is developing the EDDL library (European Distributed Deep-learning Library). This library targets distributed training on HPC infrastructures with CPUs and GPUs. Support is also being provisioned for inference processes on FPGA devices. In this talk, we will review the effort developed within the DeepHealth project (and other associated projects) for the support of efficient inference processes by EDDL on FPGA devices. High Level Synthesis (HLS) is used for development of accelerator units mastered by the EDDL API. In the talk we will review the design alternatives and the choices we took, comparing our solution with more standard accelerator units available today. |
09:50 | PRESENTER: Ji Liu ABSTRACT. In deep neural networks, using more layers and parameters generally improves the accuracy of the models, which get bigger. Such big models have high computational complexity and big memory requirements, which exceed the capacity of small devices for inference. Knowledge distillation is an efficient approach to compress a large deep model (a teacher model) to a compact model (a student model). Existing online knowledge distillation methods typically exploit an extra data storage layer to store the knowledge or deploy the teacher model and the student model at the same computing resource, thus hurting elasticity and fault-tolerance. In this paper, we propose an elastic deep learning framework, EDL-Dist, for large scale knowledge distillation to efficiently train the student model while exploiting elastic computing resources. The advantages of EDL-Dist are three-fold. First, it decouples the inference and the training process to use heterogeneous computing resources. Second, it can exploit dynamically available computing resources. Third, it supports fault-tolerance during the training and inference processes within knowledge distillation. Our experimental validation, based on industrial-strength implementation and real datasets, shows that the throughput of EDL-Dist is up to 181\% faster than the baseline method (online knowledge distillation). |
09:00 | Smart Contract Based Public Procurement to Fight Corruption |
09:50 | Towards A Broadcast Time-Lock Based Token Exchange Protocol PRESENTER: Fadi Barbara ABSTRACT. Many proposals for token exchange mechanisms between multiple parties have centralization points. This prevents a completely trustless and secure exchange between those parties. The main issue lies in the fact that communications in projects using a blockchain are asynchronous: classical result asserts that in an asynchronous system a secure exchange of secrets is impossible, unless there is a trusted third party. In this paper, we propose our preliminary results in the creation of our Broadcast Time-Lock Exchange (BTLE) protocol. The core of BTLE is the introduction of synchronicity in communications through the use of time-lock puzzles. This makes it possible to exchange secrets between two parties while eliminating the need for a trusted third party. |
10:50 | Porting Sparse Linear Algebra to Intel GPUs PRESENTER: Yu-Hsiang Tsai ABSTRACT. With discrete Intel GPUs entering the high performance computing landscape, there is an urgent need for production-ready software stacks for these platforms. In this paper, we report how we prepare the Ginkgo math library for Intel GPUs by developing a kernel backed based on the DPC++ programming environment. We discuss conceptual differences to the CUDA and HIP programming models and describe workflows for simplified code conversion. We benchmark advanced sparse linear algebra routines utilizing the converted kernels to assess the efficiency of the DPC++ backend in the hardware-specific performance bounds. We compare the performance of basic building blocks against routines providing the same functionality that ship with Intel's oneMKL vendor library. |
11:15 | Accelerating FFT using NEC SX-Aurora Vector Engine PRESENTER: Pablo Vizcaino ABSTRACT. Novel architectures leveraging long and variable vector lengths like the NEC SX-Aurora or the vector extension of RISC-V are appearing as promising solutions on the supercomputing market. These architectures often require re-coding of scientific kernels. For example, traditional implementations of algorithms for computing the fast Fourier transform (FFT) cannot take full advantage of vector architectures. In this paper, we present the implementation of FFT algorithms able to leverage these novel architectures. We evaluate these codes on NEC SX-Aurora, comparing them with the optimized NEC libraries. We present the benefits and limitations of two approaches of RADIX-2 FFT vector implementations. We show that our approach makes better use of the vector unit, reaching higher performance than the optimized NEC library for FFT sizes under 64k elements. More generally, we prove the importance of maximizing the vector length usage of the algorithm and that adapting the algorithm to replace memory instructions with register shuffling operations can boost the performance of FFT-like computational kernels. |
11:40 | A Novel Bi-Objective Optimization Algorithm on Heterogeneous HPC Platforms for Applications with Continuous Performance and Linear Energy Profiles PRESENTER: Hamidreza Khaleghzadeh ABSTRACT. Performance and energy are the two most important objectives for optimization on modern heterogeneous HPC platforms. In this work, we study a mathematical problem motivated by the bi-objective optimization of a matrix multiplication application on such platforms for performance and energy. We formulate the problem and propose an algorithm of polynomial complexity solving the problem for the case where all the application profiles of objective type one are continuous and strictly increasing, and all the application profiles of objective type two are linear increasing. We solve the problem for the matrix multiplication application employing five heterogeneous processors that include two Intel multicore CPUs, an Nvidia K40c GPU, an Nvidia P100 PCIe GPU, and an Intel Xeon Phi. Based on our experiments, a dynamic energy saving of 17% is gained while tolerating a performance degradation of 5% (a saving of 106 Joules for an execution time increase of 0.05 seconds). |
12:05 | Towards an efficient sparse storage format for the SpMM kernel in GPUs PRESENTER: Ernesto Dufrechou ABSTRACT. The sparse matrix-matrix multiply kernel (SpMM) gained significant interest in the last years due to its applications in data science. In 2018, Zhang and Gruenwald [15] proposed the bitmap-based sparse format bmSparse and described in detail the implementation of the SpMM for Nvidia GPUs. The novel format is promising in terms of performance and storage space. In this work, we re-implement the algorithm following the authors’ guidelines, adding two new stages that can benefit performance. The experiments performed using nine sparse matrices of different sizes show significant accelerations with respect to cuSparse’s CSR variant. |
10:50 | Decentralisation Over Privacy: An Analysis Of The Bisq Trade Protocol PRESENTER: Liam Hickey ABSTRACT. The Bisq trade protocol is a key component of the Bisq decentralised exchange, allowing users to trade with one another in a decentralised manner. However, the protocol publishes trade data to the Bitcoin blockchain. In this paper, we analyse the privacy risks this creates for users. Specifically, we present two new heuristics, one to identify Bisq trades on the Bitcoin blockchain and another to cluster the addresses used in those trades. We demonstrate that these heuristics are effective in identifying the trading activity of Bisq users and aggregating their trading activity across multiple trades. We conclude with suggestions as to how best to defeat these heuristics and improve the privacy aspects of the Bisq trade protocol. |
11:15 | DoS Attacks on Blockchain Ecosystem PRESENTER: Mayank Raikwar ABSTRACT. Denial of Service (DoS) attacks are a growing threat in network services. The frequency and intensity of DoS attacks are rapidly increasing day by day. The immense financial potential of the Cryptocurrency market is a prevalent target of the DoS attack. The DoS attack events are kept on happening in cryptocurrencies and the blockchain ecosystem. To the best of our knowledge, there has not been any study on the DoS attack on the blockchain ecosystem. In this paper, we identify ten entities in the blockchain ecosystem and we scrutinize the DoS attacks on them. We also present the DoS mitigation techniques applicable to the blockchain services. Additionally, we propose a DoS mitigation technique by the use of verifiable delay function (VDF). |
11:40 | PRESENTER: Francesco Faloci ABSTRACT. Nowadays, supply chain tracing notarization is among the most used non-financial blockchain applications. However, creating a blockchain based system for the management of a supply chain remains a complex task. In this paper, we propose a graphical domain specific language (DLS) and a tool allowing the supply chain domain expert to easily represent the supply chain he needs to trace. The graphical representation of the supply chain is then translated in automatic way in a set of solidity smart contracts implementing it. A small intervention of a programmer is required to customize and finalize such smart contracts. The obtained semi-automatic process of smart contract generation will boost the blockchain usage for supply chain traceability. |
13:30 | Heterogeneous Voltage Frequency Scaling of Data-Parallel Applications for Energy Saving on Homogeneous Multicore Platforms PRESENTER: Pawel Bratek ABSTRACT. In this paper, for the first time, we explore and establish the combined benefits of heterogeneous DVFS (dynamic voltage frequency scaling) control in improving the energy-performance behavior of data-parallel applications on shared-memory multicore systems. We propose to customize the clock frequency individually for the appropriately selected groups of cores corresponding to the diversified time of actual computation. In consequence, the advantage of up to 20 percentage points over the homogeneous frequency scaling is achieved on the ccNUMA server with two 18-core Intel Xeon Gold 6240 containing 72 logical cores in total. The cost and efficiency of the proposed pruning algorithm for selecting heterogeneous DVFS configurations against the brute-force search are verified and compared experimentally. |
13:55 | Data management model to program irregular compute kernels on FPGA: application to heterogeneous distributed system PRESENTER: Erwan Lenormand ABSTRACT. To meet the growing needs of computing power and the energy consumption constraints, computer systems are increasingly heterogeneous. These systems are more complex to program, especially for applications with irregular memory access patterns. This paper presents a data management model targeting heterogeneous distributed systems integrating reconfigurable accelerators. The purpose of this model is to reduce the complexity of developing applications with multidimensional sparse data structures. It relies on shared memory paradigm, which is convenient for parallel programming of irregular applications. The distributed data, sliced in chunks, are managed by a Software-Distributed Shared Memory (S-DSM). The integration of re- configurable accelerators in this S-DSM, by breaking the master-slave model, allows devices to initiate access to chunks. We use chunk partitioning of multidimensional sparse data structures, such as sparse matrices and unstructured meshes, to access them as a continuous data stream. This model enables to regularize memory accesses of irregular applications, to avoid the transfer of unnecessary data by providing fine-grained data access, and to efficiently hide data access latencies by implicitly overlaying the transferred data flow with the processed data flow. We have used two case studies to validate the proposed data management model: General Sparse Matrix-Matrix Multiplication (SpGEMM) and Shallow Water Equations (SWE). The results obtained show that the proposed model efficiently hides the data access latencies by reaching computation speeds close to those of an ideal case (i.e. without latency). |
14:20 | Domain-Specific Runtime to Orchestrate Computation on Heterogeneous Platforms PRESENTER: Jared O'Neal ABSTRACT. Task-based runtime systems in the past have sought to exploit inherent asynchronicity in the application execution to reduce overall execution time. In the last decade, focus shifted to supporting the heterogeneity that is increasingly prevalent in high-performance computing systems. The existing task-based runtime systems are designed to be general, thus, they come with a challenging set of issues such as the complexity of abstractions and overheads. Much of the burden of exposing heterogeneous parallelism is left to application developers, who have to fit domain-specific code to interfaces of general-purpose runtimes. This paper presents a different approach, targeting heterogeneous systems through domain-specific runtimes. Our design aspires to leverage the domain-specific knowledge of a focused class of scientific simulations to pragmatically orchestrate computations in the simulations. |
13:30 | Integrating Fog Computing and Blockchain Technology for Applications with Enhanced Trust and Privacy ABSTRACT. Blockchain was first introduced in 2009 as the technology behind the Bitcoin digital currency. It is the backbone technology for many Distributed Ledger and Distributed Computing applications, such as digital cryptocurrencies and digital smart contracts. Solutions integrated with them excel the provenance of high levels of security and trust, while they guarantee immutable transactional history without the interference or control of a central authority. For more than a decade, Blockchain applications had been proposed in wide variety of environments such as Internet of Things, Fog Computing, Artificial Intelligence, Mobile Computing, etc. The IoTCloud group at the University of Szeged, Hungary, has started to investigate Blockchain and Fog Computing integration possibilities in 2019. The advantages of Blockchain-Fog integration include enhanced security, integrity, reliability, and fault-tolerance, due to the decentralization and trust management mechanisms they provide. In this talk I summarize the results we achieved in Blockchain-Fog integration, then present our current works and future directions. First, we conducted a survey to highlight the roles Blockchains played in cloud and fog systems, and presented how the corresponding research communities envisage the future Blockchain-Fog integration. Based on the identified challenges, we turned our attention to address some of these issues. To enhance task scheduling in cloud systems with Blockchain-Fog extensions, we proposed an Ant Colony Optimization algorithm in a fog-enabled Blockchain-assisted scheduling model, called PF-BTS. It exploits Blockchain miners for generating efficient assignment of tasks to be executed in the cloud, and awards miner nodes for their contribution in generating the best schedule. To enable the development and evaluation of methods for Blockchain-Fog integration, we designed a special purpose simulator called FoBSim. By realizing such integration, a Blockchain can be placed in the fog layer, the end-user layer, or the cloud layer. The simulator is capable of investigating the differences of such placements, and their main properties for executing certain applications with different requirements. The simulator has different, built-in consensus algorithms and different deployment options and functionalities for Blockchain integration. In our current works we aim at investigating Blockchain-Fog integration advantages for real-world applications. We already started to develop a Blockchain-assisted credential management system, which could be used for applications needing trustful and privacy-aware, distributed credential issuing and validation. The currently targeted use cases include international student diploma management and COVID-19 immunity passport management. In our future work, we plan to extend the FoBSim environment with additional consensus algorithms and a graphical interface to ease its utilization. |
14:20 | SMART: a Tool for Trust and Reputation Management in Social Media PRESENTER: Nishant Saurabh ABSTRACT. Social media platforms are becoming increasingly popular and essential for next-generation connectivity. However, the emergence of social media also poses critical trust challenges due to the vast amount of created and propagated content. This paper proposes a data-driven tool called SMART for trust and reputation management based on community engagement and rescaled sigmoid model. SMART's integrated design adopts a set of expert systems with a unique inference logic for trust estimation to compute weighted trust ratings of social media content. SMART further utilizes the trust ratings to compute user reputation and represent them using a sigmoid curve that prevents infinite accumulation of reputation ratings by a user. We demonstrate the SMART tool prototype using a pilot social media application and highlight its user-friendly interfaces for trustworthy content exploration. |
15:15 | Kernel Fusion in OpenCL PRESENTER: John Stratton ABSTRACT. Kernel Fusion is a widely applicable optimization for numerical libraries on heterogeneous systems. However, most automated systems capable of performing the optimization require changes to software development practices, through language extensions or constraints on software organization and compilation. This makes such techniques inapplicable for preexisting software in a language like OpenCL. This work introduces an implementation of kernel fusion that can be deployed fully within the dened role of the OpenCL library implementation. This means that programmers with no explicit intervention, or even precompiled OpenCL applications, could utilize the optimization. Despite the lack of explicit programmer eort, our compiler was able to deliver an average of 12.3% speedup over a range of applicable benchmarks on a target CPU platform. |
15:40 | Feasibility study of Molecular Dynamics kernels exploitation using EngineCL PRESENTER: Raúl Nozal ABSTRACT. The ubiquity of heterogeneous systems facilitates the exploitation of scientific problems, mainly due to the performance and energy efficiency characteristics of their devices. Molecular dynamics simulators are a field of application of interest due to their potential savings in execution times and reduction of energy consumption. For this reason, they have been optimized for years for multi-core HPC architectures, making it difficult to port them to other architectures that allow simultaneous computation on several devices. In this work, a high usability and performance runtime is extended to enable efficient co-execution of molecular dynamics kernels. Several contributions are made including support for a new execution core and a hybrid co-execution mode, solving the problems encountered when running only with OpenCL-based technologies. Experimental evaluation shows improvements in all the kernels studied, obtaining on average speedups of up to 1.38 in performance and 1.60 in energy efficiency over the current optimized version. |
16:05 | PRESENTER: Paweł Koperek ABSTRACT. Deep Reinforcement Learning has been recently a very active field of research. The policies generated with use of that class of train-ing algorithms are flexible and thus have many practical applications. In this paper we present the results of our attempt to use the recent ad-vancements in Reinforcement Learning to automate the management of resources in a compute cloud environment. We describe a new approach to self-adaptation of autonomous management, which uses a digital clone of the managed infrastructure to continuously update the control policy. We present the architecture of our system and discuss the results of evaluation which includes autonomous management of a sample application deployed to Amazon Web Services cloud. We also provide the details of training of the management policy using the Proximal Policy Optimization algorithm. Finally, we discuss the feasibility to extend the presented approach to further scenarios. |
16:30 | An Automata–based Approach to Profit Optimization of Cloud Brokers in IaaS Environment PRESENTER: Jakub Gąsior ABSTRACT. We consider the problem of profit optimization for cloud brokerage service in the IaaS environment. We replace this optimization problem with a game-theoretic approach where players tend to achieve a solution by reaching a Nash equilibrium. We propose a fully distributed algorithm based on applying the Spatial Prisoner’s Dilemma (SPD) game and a phenomenon of collective behavior of players participating in the game composed of two classes of automata-based agents - Cellular Automata (CA) and Learning Automata (LA). We introduce dynamic strategies like local profit sharing, mutation, and competition, which stimulate the evolutionary process of developing collective behavior among players to maximize their profit margin. We present the results of an experimental study showing the emergence of collective behavior in such systems. |
15:15 | Towards Generating Realistic Trace for Simulating Functions-as-a-Service PRESENTER: Dilshad Hassan Sallo ABSTRACT. Serverless computing is a step forward to provide a cloud environment that responds to user requests by mainly focusing on managing infrastructure, resources and configurations. Despite the widespread use of cloud simulators, they are still mainly focused on supporting more traditional Infrastructure-as-a-Service scenarios and this reduces their applicability in the serverless and function as a service domains. Moreover, workload traces typically employed by IaaS simulators to represent user behaviour, are not well adoptable for serverless model. More realistic and serverless-like traces are essential to simulate and predict the behaviour of functions in serverless systems. Therefore, this paper focuses on generating realistic traces for simulating serverless computing platforms. The generated traces produced by our approach are based on the Azure Functions dataset and they are readily applicable in an already available versatile and high performance simulator (DISSECT-CF). We validated the generated approach using the coefficient of determination (R2), which shows very good values for the average and percentiles of the execution time and memory. To demonstrate the benefits of the generated traces we introduced a rudimentary model for serverless systems to DISSECT-CF. Our evaluation shows that our workloads are realistic and closely follow the behaviour of Azure's function as a service component. |
15:40 | SPIRIT: A microservice-based framework for interactive Cloud infrastructure planning PRESENTER: Spiros Koulouzis ABSTRACT. The IaaS model provides elastic infrastructure that enables the migration of legacy applications to cloud environments. Many cloud computing vendors such as Amazon Web Services, Microsoft Azure, and Google Cloud Platform offer a pay-per-use policy that allows for a sustainable reduction in costs compared to on-premise hosting, as well as enable users to choose various geographically distributed data centers. Using state-of-the-art planning algorithms can help application owners to estimate the size and characteristics of the underlying cloud inveterate. However, it's not always clear which is the optimal solution especially in multi-cloud environments with complex application requirements and \gls{QoS} constraints. In this paper, we propose an open framework named SPIRIT, which allows a user to include cloud infrastructure planning algorithms and to evaluate and compare their solutions. SPIRIT achieves this by allowing users to interactively study infrastructure planning algorithms by adjusting parameters via a graphical user interface, which visualizes the results of these algorithms. In the current prototype, we have included the IaaS Partial Critical Path algorithm. By taking advantage of SPIRIT's microservice-based architecture and its generic interfaces a user can add to the framework, new planning algorithms. SPIRIT is able to transform an abstract workflow described using the CWL to a concrete infrastructure described using the TOSCA specification. This way the infrastructure descriptions can be ranked on various key performance indicators. |
16:10 | PRESENTER: David E. Singh ABSTRACT. The transmission of COVID-19 through a population depends on many factors which model, incorporate, and integrate a large number of heterogeneous data sources. The work we describe in this paper focuses on the data management aspect of EpiGraph, a scalable agent-based virus-propagation simulator. We describe the data acquisition and pre-processing tasks that are necessary to map the data to the different models implemented in EpiGraph in a way that is efficient and comprehensible. We also report on post-processing, analysis, and visualization of the outputs, tasks that are fundamental to make the simulation results useful for the final users. Our simulator captures complex interactions between social processes, virus characteristics, travel patterns, climate, vaccination, and non-pharmaceutical interventions. We end by demonstrating the entire pipeline with one evaluation for Spain for the third COVID wave starting on December 27th of 2020. |
16:35 | Merging Real Images with Physics Simulations via Data Assimilation PRESENTER: Rossella Arcucci ABSTRACT. This work has started from the necessity of improving the accuracy of numerical simulations of COVID-19 transmission. Coughing is one of the most effective ways to transmit SARS-CoV-2, the strain of coronavirus that causes COVID-19. Cough is a spontaneous reflex that helps to protect the lungs and airways from unwanted irritants and pathogens and it involves droplet expulsion at speeds close to 50 miles/hour. Unfortunately, it’s also one of the most efficient ways to spread diseases, especially respiratory viruses that need host cells in which to reproduce. Computational Fluid Dynamics (CFD) are a powerful way to simulate droplets expelled by mouth and nose when people are coughing and/or sneezing. As with all numerical models, the models for coughing and sneezing introduce uncertainty through the selection of scales and parameters. Considering these uncertainties is essential for the acceptance of any numerical simulation. Numerical forecasting models often use Data Assimilation (DA) methods for uncertainty quantification in the medium to long-term analysis. DA is the approximation of the true state of some physical system at a given time by combining time-distributed observations with a dynamic model in an optimal way. DA incorporates observational data into a prediction model to improve numerically forecast results. In this paper, we develop a Variational Data Assimilation model to assimilate direct observation of the physical mechanisms of droplet formation at the exit of the mouth during coughing. Specifically, we use high-speed imaging, from prior research work, which directly examines the fluid fragmentation at the exit of the mouths of healthy subjects in a sneezing condition. We show the impact of the proposed approach in terms of accuracy with respect to CFD simulations. |