Days: Wednesday, December 14th Thursday, December 15th Friday, December 16th
View this program: with abstractssession overviewtalk overview
Vladimir Voevodin
- Deputy Director, Research Computing Center, Lomonosov Moscow State University
- Head of the Department on Supercomputers and Quantum Informatics, Computational Mathematics and Cybernetics Faculty, MSU
The computing world is changing rapidly. All devices – from mobile phones and personal computers to high-performance supercomputers – are becoming parallel. The huge capacity of modern supercomputers allows complex problems, previously thought impossible, to be solved. Performance of the best supercomputers in the world is measured in Petaflops providing unprecedentedly powerful instruments for research. At the same time, the efficient usage of all opportunities offered by modern computing systems represents a global challenge and requires new knowledge, skills and abilities, where one of the main roles belongs to understanding of key properties of parallel algorithms. The talk will address the urgent need for theoretical and practical technologies of an accurate and concerted design of high performance computing systems, highly parallel algorithms, and extreme scaled applications to be able to solve large problems using the current and prospective generations of high performance computing systems. The most essential concept behind these technologies is co-design which is a very close partnership or interrelationship between all the layers involved in the process of solving these problems on high performance computing systems: mathematical methods, algorithms, applications, programming technologies, runtime systems, layers of system software and hardware. The notion of co-design is so important for HPC that the following thesis is definitely true: “No efficient co-design technologies, no reasonable exascale in the future”.
Vladimir Voevodin is Deputy Director of the Research Computing Center at Lomonosov Moscow State University. He is Head of the Department “Supercomputers and Quantum Informatics” at the Computational Mathematics and Cybernetics Faculty of MSU, professor, corresponding member of Russian academy of sciences.
Vl. Voevodin specializes in parallel computing, supercomputing, extreme computing, program tuning and optimization, fine structure of algorithms and programs, parallel programming technologies, scalability and efficiency of supercomputers and applications, supercomputing co-design technologies, software tools for parallel computers, and supercomputing education. His research, experience and knowledge became a basis for the supercomputing center of Moscow State University, which was founded in 1999 and is currently the largest supercomputing center in Russia. He has contributed to the design and implementation of the following tools, software packages, systems and online resources: V- Ray, X-Com, AGORA, Parallel.ru, hpc-education.ru, hpc-russia.ru, LINEAL, Sigma, Top50, OctoShell, Octotron, AlgoWiki. He has published 90 scientific papers with 4 books among them. Vl.Voevodin is one of the founders of Supercomputing Consortium of Russian Universities established in 2008, which currently comprises more than 60 members. He is a leader of the major national activities on Supercomputing Education in Russia and General Chair of the two largest Russian supercomputing conferences.
10:30 | Redundancy Elimination in the ExaStencils Code Generator ( abstract ) |
11:00 | A Dataflow IR for Memory Efficient RIPL Compilation to FPGAs ( abstract ) |
11:30 | A new scalable approach for distributed metadata in HPC ( abstract ) |
12:00 | Enabling Android-based devices to high-end GPGPUs ( abstract ) |
10:30 | A GPU-based backtracking algorithm for permutation combinatorial problems ( abstract ) |
11:00 | Buffer Minimization for Rate-optimal Scheduling of Synchronous Dataflow Graphs on Multicore Systems ( abstract ) |
11:30 | Light Loss-Less Data Compression, with GPU implementation ( abstract ) |
12:00 | Deterministic Construction of Regular Geometric Graphs with Short Average Distance and Limited Edge Length ( abstract ) |
10:30 | Synthetic Traffic Model of the Graph500 Communications ( abstract ) |
11:00 | Road Segment Information based Named Data Networking for Vehicular Environments ( abstract ) |
11:30 | A Distributed Formal Model for the Analysis and Verification of Arbitration Protocols on MPSoCs Architecture ( abstract ) |
10:30 | Mixed Reality Cubicle and Cave Automatic Virtual Environment ( abstract ) |
11:00 | Toward Ubiquitous Data Processing ( abstract ) |
11:30 | BaaS-4US: A Framework to Develop Standard Backends as a Service for Ubiquitous Applications ( abstract ) |
14:00 | Deciding Soundness for Workflow Nets Based on Branching Processes ( abstract ) |
14:30 | Creating Distributed Execution Plans with BobolangNG ( abstract ) |
15:00 | A C++ Generic Parallel Pattern Interface for Stream Processing ( abstract ) |
15:30 | A Portable Lock-free Bounded Queue ( abstract ) |
14:00 | Scaling DBSCAN-like algorithms for event detection systems in Twitter ( abstract ) |
14:30 | Feedback Control Optimization for Performance and Energy Efficiency on CPU-GPU Heterogeneous Systems ( abstract ) |
15:00 | Implementing Snapshot Objects on Top of Crash-Prone Asynchronous Message-Passing Systems ( abstract ) |
15:30 | Towards Parallel CFD computation for the ADAPT framework ( abstract ) |
14:00 | An Automated Finger-Knuckle-Print Identification System Using Jointly RBF & RFT Classifiers ( abstract ) |
14:30 | Bluetooth Low Energy based Occupancy Detection for Emergency Management ( abstract ) |
15:00 | Long Life Application Approach for user context management and situation understanding ( abstract ) |
14:00 | Versatile mixed reality educational spaces: a medical education implementation case ( abstract ) |
14:30 | Feature Extraction for Emotion Recognition and Modelling using Neurophysiological Data ( abstract ) |
15:00 | Learning Physics in an Immersive and Interactive Virtual Reality Laboratory ( abstract ) |
15:30 | Engaging immersive video consumers: Challenges regarding 360-degree gamified video applications ( abstract ) |
16:30 | Optimized Mapping Spiking Neural Networks onto Network-on-Chip ( abstract ) |
17:00 | Intelligent SPARQL Endpoints: Optimizing Execution Performance by Automatic Query Relaxation and Queue Scheduling ( abstract ) |
16:30 | Improving the Performance of Cardiac Simulations in a Multi-GPU Architecture Using a Coalesced Data and Kernel Scheme ( abstract ) |
17:00 | An Efficient Implementation of LZW Compression in the FPGA ( abstract ) |
17:30 | Methodological Approach to Data-Centric Cloudification of Scientific Iterative Workflows ( abstract ) |
16:30 | An Empirical Study of Denial of Service (DoS) against VoIP ( abstract ) |
17:00 | Behaviour-based anomaly detection of cyber-physical attacks on a robotic vehicle ( abstract ) |
17:30 | On The Evaluation of Security Properties of Containerized Systems ( abstract ) |
16:30 | Interactive Virtual Archaeology: Constructing the prehistoric past at Avebury henge ( abstract ) |
17:00 | An Evaluation of the Efficacy of a Perceptually Controlled Immersive Environment for Learning Acupuncture ( abstract ) |
17:30 | Integrating virtual worlds with Learning Management Systems: the MULTIS approach ( abstract ) |
View this program: with abstractssession overviewtalk overview
Pedro José Marrón
- Full Professor for Pervasive Computing at the University of Duisburg-Essen and Co-Founder of Locoslab GmbH
The world is changing at an extremely rapid pace and it seems impossible even for computer scientists to keep up with the evolution of technologies. Fifteen years ago, smart phones were just a dream and people were reluctant to go online for many things. Nowadays, everything seems to be moving to the virtual realm and as of today, 3 billion people have regular access to the Internet and to communication technologies. This number is equal to the world population in 1967. In this talk, we will look back at some of the predictions of future computing done by “experts” in the last years and will analyze the state of Ubiquitous Computing technologies using examples from current research projects not with virtual entities, but with real people in real cities.
Prof. Dr. Pedro José MARRÓN received his bachelor and master’s degree in computer engineering from the University of Michigan in Ann Arbor in 1996 and 1998. At the end of 1999, he moved to the University of Freiburg in Germany to work on his Ph.D., which he received with honors in 2001. From 2003 until 2007, he worked at the University of Stuttgart as a senior researcher, leading the mobile data management and sensor network group. In 2007, he left Stuttgart to become a Professor of Computer Science at the University of Bonn, where he led the sensor networks and pervasive computing group. In 2009 he left Bonn to become a full Professor at the University of Duisburg-Essen. He is currently head of the “Networked Embedded Systems Group” (NES) which counts with almost 20 researchers working on fields related to ubiquitous computing. He is also co-founder of Locoslab GmbH, a spin-off of the University of Duisburg-Essen specialized in providing complete solutions for location-based services. Additionally, Pedro Marrón is also the initiator and president of UBICITEC, the European Center for Ubiquitous Technologies and Smart Cities, which counts with over 20 institutional partners from industry and academia forming a virtual European Center with clear research and dissemination objectives. The goal of UBICITEC is to coordinate the research efforts on enabling technologies for Smart Cities, e.g. Internet of Things and to encourage the transfer of technology to industry.
09:00 | 5G-XHaul: Enabling scalable virtualization for future 5G Transport Networks ( abstract ) |
09:30 | Generalized Orchestration of IT/Cloud and Networks for SDN/NFV 5G Services ( abstract ) |
10:00 | Multi-tenancy architectures for heterogeneous resource allocations ( abstract ) |
10:30 | Graphein: A Novel Optical High-radix Switch Architecture for 3D Integration ( abstract ) |
11:00 | Online Resource Coalition Reorganization for Efficient Scheduling on the Intercloud ( abstract ) |
11:30 | Improving the Performance of Volunteer Computing with Data Volunteers: A Case Study with the ATLAS@home Project ( abstract ) |
12:00 | 3-additive Approximation Algorithm for Multicast Time in 2D Torus Networks ( abstract ) |
10:30 | Porting Matlab applications to high-performance C++ codes: CPU/GPU-accelerated spherical deconvolution of diffusion MRI data ( abstract ) |
11:00 | Modeling Performance of Hadoop Applications: A Journey from Queueing Networks to Stochastic Well Formed Nets ( abstract ) |
11:30 | D-SPACE4Cloud: A Design Tool for Big Data Applications ( abstract ) |
12:00 | On Stochastic performance and cost-aware optimal capacity planning of unreliable Infrastructure-as-a-Service cloud ( abstract ) |
10:30 | Traffic Sign Recognition Based on Parameter-free Detector and Multi-modal Representation ( abstract ) |
10:55 | Statistical analysis of CCM.M-K1 International Comparison based on Monte Carlo method ( abstract ) |
11:20 | Secure data access in Hadoop using elliptic curve cryptography ( abstract ) |
11:45 | The Research of Recommendation System based on User-Trust Mechanism and Matrix decomposition ( abstract ) |
12:10 | Reversible data hiding using non-local means prediction ( abstract ) |
Supercomputing Co-Design Technology Workshop (SCDT-2016)
Co-Design for Extreme Scale Computing through Runtime Technologies
Thomas Sterling
Intelligent Systems Engineering Department, Professor
Center for Research in Extreme Scale Technologies Indiana University, Director
Abstract
As HPC enters the 100-Petaflops era with the introduction of the TaiHuLight computer in China, the co-design of algorithms, hardware architecture, and enabling system software is becoming imperative. The recent generations of systems has exemplified what has been convenient for hardware technologies for maximum flops and reduction of energy, but not application execution efficiency or user productivity. This is particularly true for those problems that require strong scaling or have significant dynamic components. Runtime system software offer an opportunity to address these and other challenges but suffer from overhead costs that hinder their effectiveness, at least for some problems requiring exploitation of near fine-grain parallelism. Co-design of system hardware and software architecture with dynamic adaptive application requirements may drive advances of future classes of scalable systems for extreme scale. This presentation will describe the challenges and the gaps between system architecture and dynamic execution that may be bridged through advances in algorithms, runtime software, and hardware architecture design. Results from recent research with the HPX-5 runtime software system and consideration of investigation of FPGA runtime hardware support will be discussed and conclusions derived in support of advanced co-design principles. Questions are welcome from the audience throughout the presentation.
Brief Biography
Dr. Thomas Sterling holds the position of Professor of Informatics and Computing at the Indiana University (IU) School of Informatics and Computing Department of Intelligent Systems Engineering (ISE) as well as serves as Director of the IU Center for Research in Extreme Scale Technologies (CREST). Since receiving his Ph.D from MIT in 1984 as a Hertz Fellow, Dr. Sterling has engaged in applied research in parallel computing system structures, semantics, and operation in industry, government labs, and academia. Dr. Sterling is best known as the "father of Beowulf" for his pioneering research in commodity/Linux cluster computing for which he shared the Gordon Bell Prize in 1997. He led the HTMT Project sponsored by multiple agencies to explore advanced technologies and their implication for high-end computer system architectures. Other research projects in which he contributed included the DARPA DIVA PIM architecture project with USC-ISI, the DARPA HPCS program sponsored Cray-led Cascade Petaflops architecture, and the Gilgamesh high-density computing project at NASA JPL. Sterling is currently involved in research associated with the innovative ParalleX execution model for extreme scale computing to establish the foundation principles guiding the development of future generation Exascale computing systems. ParalleX is currently the conceptual centerpiece of the XPRESS roject as part of the DOE X-stack program and has been demonstrated via the proof-of-concept HPX-5 runtime system software. Dr. Sterling is the co-author of six books and holds six patents. He was the recipient of the 2013 Vanguard Award and is a Fellow of the AAAS. He is also co-guest editor with Bill Gropp of the HPCwire Exascale Edition.
10:30 | Co-Design for Extreme Scale Computing through Runtime Technologies ( abstract ) |
11:30 | System monitoring-based holistic resource utilization analysis for every user of a large HPC center ( abstract ) |
11:50 | Automated Parallel Simulation of Heart Electrical Activity Using Finite Element Method ( abstract ) |
12:10 | Using hStreams programming library for accelerating a real-life application on Intel MIC ( abstract ) |
10:30 | Seer: Empowering Software Defined Networking with Data Analytics ( abstract ) |
11:00 | Multi-Domain Orchestration for NFV: Challenges and Research Directions ( abstract ) |
11:30 | Baguette: Towards end-to-end service orchestration in heterogeneous networks ( abstract ) |
12:00 | Energy efficient orchestration of virtual services in 5G integrated fronthaul/backhaul infrastructures ( abstract ) |
14:00 | Intel Software Development Tools for Parallel Computing ( abstract ) |
14:30 | The Impact of Panel Factorization on the Gauss-Huard Algorithm for the Solution of Linear Systems on Modern Architectures ( abstract ) |
15:00 | Improving Hash Distributed A* for shared-memory architectures using abstraction ( abstract ) |
15:30 | Leveraging the Performance of LBM-HPC for Large Sizes on GPUs using Ghost Cells ( abstract ) |
16:00 | Hardware-Based Sequential Consistency Violation Detection Made Simpler ( abstract ) |
14:30 | Improving the energy efficiency of Evolutionary Multiobjective algorithms ( abstract ) |
15:00 | Network-aware Optimization of MPDATA on Homogeneous Multi-core Clusters with Heterogeneous Network ( abstract ) |
15:30 | Comparative Analysis of OpenACC Compilers ( abstract ) |
14:30 | Efficient Distributed Computations with DIRAC ( abstract ) |
14:50 | The Co-design of Astrophysical Code for Massively Parallel Supercomputers ( abstract ) |
15:10 | Generalized Approach to Scalability Analysis of Parallel Applications ( abstract ) |
15:30 | Co-design of a particle-in-cell plasma simulation code for Intel Xeon Phi: a first look at Knights Landing ( abstract ) |
14:30 | Implementation of the Beamformer Algorithm for the NVIDIA Jetson ( abstract ) |
15:00 | Efficiency of GPUs for Relational Database Engine Processing ( abstract ) |
15:30 | MARL-Ped+Hitmap: Towards improving agent-based simulations with distributed arrays ( abstract ) |
14:30 | Improved Track Path Method in Real Time by using GPS and Accelerometer Data ( abstract ) |
15:00 | An Event-Based Approach for Discovering Activities of Daily Living by Hidden Markov Models ( abstract ) |
15:30 | A Learning System to Support Social and Empathy Disorders Diagnosis through Affective Avatars ( abstract ) |
16:00 | Analysis of the Innovation Outputs in mHealth for Patient Monitoring ( abstract ) |
16:30 | Shared Memory Tile-based vs Hybrid Memory GOP-based Parallel Algorithms for HEVC Encoder ( abstract ) |
17:00 | GPU-Based Heterogeneous Coding Architecture for HEVC ( abstract ) |
17:30 | Optimizing GPU code for CPU execution using OpenCL and vectorization: a case study on image coding ( abstract ) |
18:00 | Efficient Parallel Algorithm for Optimal DAG Structure Search on Parallel Computer with Torus Network ( abstract ) |
16:30 | A parallel model for heterogeneous cluster ( abstract ) |
17:00 | Formalizing Data Locality in Task Parallel Applications ( abstract ) |
17:30 | OTFX: An In-memory Event Tracing Extension to the Open Trace Format 2 ( abstract ) |
18:00 | Tuning the Blocksize for Dense Linear Algebra Factorization Routines with the Roofline Model ( abstract ) |
16:30 | Hardware-Specific Selection the Most Fast-Running Software Components ( abstract ) |
16:50 | Educational and Research Systems for Evaluating the Efficiency of Parallel Computations ( abstract ) |
17:10 | Workshop Closing ( abstract ) |
16:30 | Exploring a Distributed Iterative Reconstructor based on Split Bregman using PETSc ( abstract ) |
17:00 | I/O-focused Cost Model for the Exploitation of Public Cloud Resources in Data-Intensive Workflows ( abstract ) |
17:30 | Geocon: A Middleware for Location-aware Ubiquitous Applications ( abstract ) |
16:30 | Study of Wearable And 3D-Printable Vibration-Based Energy Harvesters ( abstract ) |
17:00 | Activity Recognition in a Home Setting using Off the Shelf Smart Watch Technology ( abstract ) |
17:30 | The Effectiveness of Upward and Downward Social Comparison of Physical Activity in an Online Intervention ( abstract ) |
18:00 | First Approach to Automatic Performance Status Evaluation and Physical Activity Recognition in Cancer Patients ( abstract ) |
View this program: with abstractssession overviewtalk overview
Rafael Asenjo
- Associate Professor at the Dept. of Computer Architecture, Univ. Malaga
Heterogeneous computing is seen as a path forward to deliver the energy and performance improvements needed over the next decade. That way, heterogeneous systems feature GPUs (Graphics Processing Units) or FPGAs (Field Programmable Gate Arrays) that excel at accelerating complex tasks while consuming less energy. There are also heterogeneous architectures on-chip, like the processors developed for mobile devices (laptops, tablets and smartphones) comprised of multiple cores and a GPU. More recently, some architectures have also paired multicores along with an FPGA in the same die. Examples of the latest are Xilinx Zync (2 cores Cortex-A9 + FPGA), Xilinx UltraScale+ MPSoC (4 cores Cortex-A53 + GPU Mali 400 + FPGA) or Intel HARP (12 cores Xeon + FPGA).
This talk covers hardware and software aspects of this kind of heterogeneous architectures. Regarding the HW, we briefly discuss the underlying architecture of some heterogeneous chips composed of multicores+GPU and multicores+FPGA, delving into the differences between both kind of accelerators and how to measure the energy they consume. We also address the different solutions to get a coherent view of the memory shared between the cores and the GPU or between the cores and the FPGA. With respect to the SW, different heterogeneous programming models will be introduced, paying more attention to those that are aimed at exploiting several devices at the same time (CPU + GPU or CPU + FPGA). Again, the different optimization techniques and the levels of parallelism that are suitable for the GPU and for the FPGA will be identified. Finally, we present our own proposal that tackles heterogenous execution of applications based on the pipeline and parallel_for patterns. We discuss our extensions to the Threading Building Blocks, TBB, pipeline and parallel_for templates to automatically distribute the workload between the multicore and the accelerator, taking care of the load balancing and considering energy consumption in the scheduling policies. We evaluate the performance and energy efficiency of the different approaches for several heterogenous processors: Intel Ivy Bridge, Intel Haswell, Samsung Exynos 5 Octa, Xilinx Zync and Intel Broadwell + Altera Stratix V FPGA.
Rafael Asenjo received his B.S. and M.S. degrees in Telecommunications Engineering in 1993 and his Ph.D. degree in 1997, both from the University of Malaga, Spain. He has been an Associate Professor at the Dept. of Computer Architecture, Univ. Malaga, Spain, since 2001, where he leads a research group working on “productivity” in the context of high performance computing. He collaborated on the IBM XL-UPC compiler in 2008 and has contributed to the Cray’s Chapel runtime development since 2011. This year he served as General Chair of ACM PPoPP’16 and has also served as Program Committee member for IPDPS’13, IPDPS’14 and SC’15. His research interests include programming models, parallelization of irregular codes, parallel IO, parallelizing compilers and heterogeneous architectures.
10:30 | Impact of Shutdown Techniques for Energy-Efficient Cloud Data Centers ( abstract ) |
11:00 | Processing Partially Ordered Requests in Distributed Stream Processing Systems ( abstract ) |
11:30 | Microcities: a Platform based on Microclouds for Neighborhood Services ( abstract ) |
12:00 | Implement and Optimization of Indoor Positioning System Based on Wi-Fi Signal ( abstract ) |
10:30 | OBC Based Optimization of Re-encryption for Cryptographic Cloud Storage ( abstract ) |
11:00 | Dynamic Verifiable Search over Encrypted Data in Untrusted Clouds ( abstract ) |
11:30 | Reducing TCB of Linux Kernel Using User-Space Device Driver ( abstract ) |
10:30 | Energy-Aware Query Processing on Parallel Database Cluster Nodes ( abstract ) |
11:00 | Current flow betweenness centrality with Apache Spark ( abstract ) |
11:30 | Optimizing Inter-server Communications by Exploiting Overlapping Communities in Online Social Networks ( abstract ) |
10:30 | On a Parallel Algorithm for the Determination of Multiple Optimal Solutions for the LCSS Problem ( abstract ) |
11:00 | GPU computing to speed-up the resolution of microrheology models ( abstract ) |
11:30 | Locality of Computation for Stencil Optimization ( abstract ) |
12:00 | Bin Recycling Strategy for an Accuracy-Aware Implementation of Two-Point Angular Correlation Function on GPU ( abstract ) |
Conference closing and next location presentations.