previous day
next day
all days

View: session overviewtalk overview

09:00-10:00 Session 4: Keynote

Keynote 2

Low-level Fun with Parallel Runtime Systems

ABSTRACT. In this talk, we will have a look at how a parallel programming model interacts with contemporary multi-core and many-core processors. When thinking about low-level runtime implementations, you will see that the traditional assumptions of parallel programming may not hold and why a slightly different perspective on topic and how to obtain performance is required. After a quick recap of some OpenMP API features and how a compiler implements this programming model. Then, we will turn towards interesting runtime questions. We will show what problem waiting threads impose on waking them up again. We also will exemplify several algorithms (for, e.g., locks and barriers) and show how the processor design influences these algorithms. We will discuss some interesting effects of the modern cache hierarchy on the performance on such algorithms.

10:00-10:30Coffee Break
10:30-11:30 Session 5: Applied Machine Learning
Orchestrated Co-Scheduling, Resource Partitioning, and Power Capping on CPU-GPU Heterogeneous Systems via Machine Learning

ABSTRACT. CPU-GPU heterogeneous architectures are now commonly used in a wide variety of computing systems from mobile devices to supercomputers. Maximizing the throughput for multi-programmed workloads on such systems is indispensable as one single program typically cannot fully exploit all avaiable resources. At the same time, power consumption is a key issue and often requires optimizing power allocations to the CPU and GPU while enforcing a total power constraint, in particular when the power/thermal requirements are strict. The result is a system-wide optimization problem with several knobs. In particular we focus on (1) co-scheduling decisions, i.e., selecting programs to co-locate in a space sharing manner; (2) resource partitioning on both CPUs and GPUs; and (3) power capping on both CPUs and GPUs. We solve this problem using predictive performance modeling using machine learning in order to coordinately optimize the above knob setups. Our experiential results using a real system show that our approach achieves up to 67% of speedup compared to a time-sharing-based scheduling with a naive power capping that evenly distributes power budgets across components.

FPGA-based Dynamic Deep Learning Acceleration for Real-time Video Analytics

ABSTRACT. Deep neural networks (DNNs) are a key technique in modern artificial intelligence that has provided state-of-the-art accuracy on many applications, and they have received significant interest. The requirements for ubiquity of smart devices and autonomous robot systems is placing heavy demands on DNNs-inference hardware, with high requirement for energy and computing efficiencies, along with the rapid development of AI techniques. The high energy efficiency, computing capabilities, and reconfigurability of FPGAs make these a promising platform for hardware acceleration of such computing tasks. This paper primarily address this challenge and propose a new flexible hardware accelerator framework to enable adaptive support for various DL algorithms on an FPGA-based edge computing platform. This framework allows run-time reconfiguration to increase power and computing efficiency of both DNN model/software and hardware, to meet the requirements of dedicated application specifications and operating environments.

11:30-12:30 Session 6: Advanced Computing Techniques and Organic Computing (OC1)
Effects of Approximate Computing on Workload Characteristics
PRESENTER: Daniel Maier

ABSTRACT. In recent years, many new approaches in approximate computing have been presented. These techniques have shown great opportunities to improve applications' performance or energy consumption while trading often a negligible decrease in accuracy. Many different techniques have been invented, ranging from hardware techniques to approaches purely implemented in software, compilers and frameworks. Research has shown that applications often have very specific demands on hardware and usually suffer from specific bottlenecks such as memory bandwidth or compute capabilities. Developers optimize applications using advanced techniques in order to optimally exploit hardware capabilities. We study how the workload character of the applications is affected when they are optimized using approximate computing techniques. We analyze the detailed micro-architectural application character by conducting 55 characteristics. We evaluate four approximate computing techniques on all 37 applications from PolyBench/C and AxBench. We show how the optimization using the different approximation techniques influences both the properties of individual applications as well as groups curated from similar applications' characters. We find results that contradict to general expectations, e.g., an increasing number of instructions executed when the opposite is expected. Furthermore, some applications are slowed down when approximated which is confirmed by the number of executed instructions. These results show, that approximation of applications changes the core of applications characteristics and that a detailed analysis of approximated applications is required. Interference between between traditional optimizations and approximation requires a holistic approach.

NDNET: a Unified Framework for Anomaly and Novelty Detection

ABSTRACT. NDNET (\url{})is an anomaly and novelty detection library that implements various detection algorithms adjusted for online processing of data streams. The intention of this library is threefold: 1) Make experimentation with different anomaly and novelty detection algorithms simple. 2) Support the development of new novelty detection approaches by providing the mCANDIES framework. 3) Provide fundamentals to analyze and evaluate novelty detection algorithms on data streams. The library is freely available and developed as open-source software.

12:30-13:30Lunch Break
13:30-15:00 Session 7: Organic Computing (OC2): Systems & Applications

OC 2

GAE-LCT: A run-time GA-based Classifier Evolution Method for Hardware LCT controlled SoC Performance-Power Optimization

ABSTRACT. Learning classifier tables (LCTs) are classifier based and lightweight hardware reinforcement learning building blocks which inherit the concepts of learning classifier systems. LCTs are used as a per-core low level controllers to learn and optimize potentially conflicting objectives e.g. achieving a performance target under a power budget. A supervisor is used at the system level which translate system and application requirements into objectives for the LCTs. The classifier population in the LCTs has to be evolved in run-time to adapt to the changes in the mode, performance targets, constraints or workload being executed. Towards this goal, we present GAE-LCT, a genetic algorithm (GA) based classifier evolution for hardware learning classifier tables. The GA uses accuracy to evolve classifiers in run-time. We introduce extensions to the LCT to enable accuracy based genetic algorithm. The GA runs as a soft-ware process on one of the cores and interacts with the hardware LCT via interrupts. We evaluate our work using DVFS on an FPGA usingLeon3 cores. We demonstrate GAE-LCT’s ability to generate accurate classifiers in run-time from scratch. GAE-LCT achieves 5% lower difference to IPS reference and 51.5% lower power budget overshoot compared to Q-table while requiring 75% less memory. The hybrid GAE-LCT also requires 12 times less software overhead compared to a full software implementation.

Organic Computing to Improve the Dependability of an Automotive Environment

ABSTRACT. The aim of our research is to implement and evaluate Organic Computing methods in a real vehicle environment and thus increase the reliability of the vehicle's steering system. For this purpose, the steering system is simulated using an Organic Computing approach, which is extended with explicit fault diagnosis capabilities. In order to achieve the goals set by the demands of the automotive industry, several development steps are necessary. This paper outlines the research gaps and potentials as well as the approaches we envision to tackle these challenges. The starting point is a general concept involving three Electronic Control Units (ECUs) that act as active redundancy in the event of a component failure, e.g., an ECU failure. Next, the Organic Computing middleware in terms of an Artificial Hormone System in combination with Artificial DNA will be implemented and examined on real automotive hardware, resulting in a highly-reliable, resource- and cost-efficient fail-operational system. Beyond the classical working principles of Organic Computing to overcome ECU failures, our project implements system-level fault diagnosis to additionally monitor ECU performance in the context of the steering system. Forecasting, detecting and identifying faults in the system enables fault-specific recovery actions, e.g., preventive service migration or degradation. Overall, we combine promising relative work to an Organic Computing approach for a vehicle steering system with an industrial partnership with the goal of a functional ECU. After successful deployment, tests will be documented with suitable vehicle hardware in a corresponding simulation environment. This paper provides insight into the early stages of our research project.

A context aware and self-improving monitoring system for field vegetables

ABSTRACT. Camera-based vision systems are becoming increasingly influential in the advancing automation of agriculture. Smart Farming technologies such as se-lective mechanical or chemical weeding are already firmly implemented processes in practice utilizing intelligent camera technology. The capabilities of such technologies recently advanced with the implementation of Deep Learning-based Computer Vision algorithms which proved their applicability in the agricultural domain by successfully solving classification, object detection and segmentation tasks. Due to the demanding environment of agricultural fields and the increasing dependence of farmers on the correct and reliable functioning of such systems, we propose to utilize agronomic con-text of a field to obtain a self-improving system for camera-based detection and segmentation of cabbage plants. For this purpose, we trained and tested a neural network for instance segmentation (Mask R-CNN) with different datasets of white cabbage (brassica oleracea). In our work, the relevant con-text parameters are the expected height of the plants as well as the color of the plant pixels. A cost-efficient camera setup utilizing a Structure from Motion (SfM) approach was used to gain complementary depth images. Knowledge gaps in our system appearing in form of missed or poorly detect-ed and segmented plants can be closed by means of an Active Learning approach. This leads to an improvement in our experiments by up to 27.2 % in terms of mean average precision.

15:00-15:30Coffee Break
15:30-17:00 Session 8: Organic Computing (OC3): Learning Capabilities & Quantum Computing

OC 1

QPU-System Co-Design for Quantum HPC Accelerators

ABSTRACT. The use of quantum processing units (QPUs) promises speed-ups for solving computational problems, but the quantum devices currently available possess only a very limited number of qubits and suffer from considerable imperfections. One possibility to progress towards practical utility is to use a co-design approach: Problem formulation and algorithm, but also the physical QPU properties are tailored to the specific application. Since QPUs will likely be used as accelerators for classical computers, details of systemic integration into existing architectures are another lever to influence and improve the practical utility of QPUs.

In this work, we investigate the influence of different parameters on the runtime of quantum programs on tailored hybrid CPU-QPU-systems. We study the influence of communication times between CPU and QPU, how adapting QPU designs influences quantum and overall execution performance, and how these factors interact. Using a simple model that allows for estimating which design choices should be subjected to optimisation for a given task, we provide an intuition to the HPC community on potentials and limitations of co-design approaches. We also discuss physical limitations for implementing the proposed changes on real quantum hardware devices.

Semi-Model-Based Reinforcement Learning in Organic Computing Systems

ABSTRACT. Reinforcement Learning (RL) can generally be distinguished into two main classes: model-based and model-free. While model-based approaches use some kind of model of the environment and exploit it for learning, model-free methods learn with the complete absence of a model. Interpolation-based RL, and more specifically Interpolated Experience Replay (IER), comes with some properties that fit very well into the domain of Organic Computing (OC). We demonstrate how an OC system can benefit from this concept and attempt to place IER into one of the two RL classes. To do so, we give a broad overview of how both of the terms (model-based and model-free) are defined and detail different model-based categorizations. It turns out that replay-based techniques are quite on the edge between both. Furthermore, even if interpolation based on stored samples could be classified as a kind of model, the general way of using the interpolated experiences remains replay-based. Here, the borders get blurry and the classes overlap. In conclusion, we define a third class: semi-model-based. Additionally, we show that some architectural approaches of the OC domain fit this new class very well and even encourage such methods.

Deep Reinforcement Learning with a Classifier System – First Steps

ABSTRACT. Organic Computing enables self-* properties in technical systems for mastering them in the face of complexity and for improving robustness and efficiency. Key technology for self-improving adaptation decisions is reinforcement learning (RL). In this paper, we argue that traditional deep RL concepts are not applicable due to their limited interpretability. In contrast, approaches from the field of rule-based evolutionary RL are less powerful. We propose to fuse both technical concepts while maintaining their advantages -- allowing for an applicability especially suited for Organic Computing applications. We present initial steps and the first evaluation of standard RL scenarios.