View: session overviewtalk overview
Abstract
Parallel computing is ubiquitous and can be found in a wide variety of applications, from high-performance computing to embedded systems. A key factor across all these areas is energy efficiency, which denotes the number of computations that can be performed per unit of energy. Hence, customization and a tight co-design of architecture and compiler are crucial for scaling future systems further.
This talk presents tightly coupled processor arrays (TCPAs), a class of massively parallel arrays of locally interconnected processing elements (PEs), as well as corresponding compilation concepts. TCPAs differ from coarse-grained reconfigurable arrays (CGRAs) in that the PEs are programmable, utilizing small instruction memories. They allow for the parallel execution of multiple rather than just the innermost loop dimension of many computationally intensive applications. Besides introducing the main architectural building blocks of these arrays, the presentation covers the corresponding application mapping, which starts from a functional programming language and involves symbolic loop compilation. In this approach, the loop bounds and number of available PEs can be unknown at compile time. Finally, the talk reports on the research endeavor of prototyping an 8x8 TCPA instance as a chip manufactured in 22 nm technology.
Bio
Frank Hannig received a Diploma degree in an interdisciplinary course of study in electrical engineering and computer science from the University of Paderborn, Germany, in 2000; a Ph.D. degree (Dr.-Ing.) and a Habilitation degree in computer science from Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Germany, in 2009 and 2018, respectively. He has led the Architecture and Compiler Design Group in the Computer Science Department at FAU since 2004. His primary research interests are the design of massively parallel architectures, ranging from dedicated hardware to multicore architectures, mapping methodologies for domain-specific computing, and architecture/compiler co-design. He has authored or co-authored more than 200 peer-reviewed publications. Dr. Hannig has served on the program committees of several international conferences (ARC, ASAP, CODES+ISSS, DAC, DATE, DASIP, SAC, SAMOS) and is an associate editor of the Journal of Real-Time Image Processing and IEEE Embedded Systems Letters. He is a Senior Member of the IEEE and an affiliate member of the European Network of Excellence on High Performance and Embedded Architecture and Compilation (HiPEAC).
10:30 | Idle is the New Sleep: Configuration-Aware Alternative to Powering Off FPGA-Based DL Accelerators During Inactivity ABSTRACT. In the rapidly evolving Internet of Things (IoT) domain, we concentrate on enhancing energy efficiency in Deep Learning accelerators on FPGA-based heterogeneous platforms. Instead of focusing on the inference phase, we introduce innovative optimizations to minimize the overhead of the FPGA configuration phase. By fine-tuning configuration parameters correctly, we achieved a 40.13-fold reduction in configuration energy. Moreover, augmented with power-saving methods, our Idle-Waiting strategy outperformed the traditional On-Off strategy in duty-cycle mode for request periods up to 499.06 ms. Specifically, at a 40 ms request period within a 4147 J energy budget, this strategy extends the system lifetime to approximately 12.39 times that of the On-Off strategy. Empirically validated through hardware measurements and simulations, these optimizations provide valuable insights and practical methods for achieving energy-efficient deployments in IoT. |
10:55 | On-the-fly CT Image Pre-Processing on MPSoC-FPGAs PRESENTER: Daniele Passaretti ABSTRACT. Due to the increasing number of tumors, new interventional Computed Tomography (CT) procedures have been proposed that aim to optimize workflow, time-effective diagnosis and treatment. To support tumor ablation, CT scanners must pre-process 2D projections and reconstruct 3D slices of the human body in real time, while data are acquired. This paper proposes a lightweight processing architecture for MPSoC-FPGAs that performs the "CT pre-processing phase" on the fly; this phase consists of the pixel processing of 2D. It is also suitable for exploring different data formats that can be selected at design time to improve performance while keeping image quality. This article focuses on the cosine and redundancy weighting steps, which can not be implemented following the standard method on embedded MPsoC-FPGAs, due to the high resource utilization costs of their arithmetic operations. Therefore, this work proposes different optimizations that permit the reduction of the number of operations to compute and the amount of on-chip memory required compared with the standard algorithm. Finally, the proposed architecture has been implemented and instantiated within a Control Data Acquisition System (CDAS) architecture running on the XC7Z045 AMD-Xilinx MPSoC-FPGA and integrated into an open-interface CT scanner assembled in our laboratory. Here, the optimized weighting steps use up to 33.8 times fewer DSPs than the implementation based on the standard solution. Furthermore, it adds only 80 ns of latency, making it 7.9 times faster than the implementation based on the standard solution. |
11:20 | AccProF: Increasing the Accuracy of Embedded Application Profiling using FPGAs ABSTRACT. Accurate software profiling is an essential step in the development of embedded systems. The accuracy of profiling data collected is critically important for embedded systems that operate under fixed timing constraints, which if not met, could lead to system failure. Existing profiling solutions targeting embedded systems introduce an overhead to the running application that distorts the collected profiling data. This paper proposes AccProf for System on Chips with integrated FPGAs. AccProF is a FPGA-assisted profiling framework which combines compiler and bespoke hardware. AccProF is composed of (1) an LLVM pass which inserts into the application binary running on general purpose processors lightweight instrumentation, and (2) a FPGA-based hardware capable of performing offloaded profiling. Offloading part of the profiling tasks, as well as data-structures, on to the FPGA reduces pollution of the collected profiling data leading to higher accuracy. This paper addresses on control graph profiling and evaluates AccProf on a range of benchmarks ported to SeL4 microkernel running on the AMD Zynq MpSoC. We measure performance metrics of these benchmarks across a range of processor statistics including cycles, instruction and data cache misses. We show that the impact for certain metrics is reduced for up to $5\times$ when compared against an equivalent software-based framework. |
16:00 | Enhancing Maritime Behaviour Analysis through Novel Feature Engineering and Digital Shadow Modelling: A Case Study in Kiel Fjord ABSTRACT. With the continuous evolution of maritime technology, there is a growing need for analysing and modelling vessel behaviour in complex waterway systems. This paper presents an extension and utilisation of the Surface Vessel Nautical Behaviour Analysis (SV-NBA) framework for in-depth spatio-temporal analysis of maritime surface vessels' behaviour in Kiel Fjord. Leveraging one year of collected Automatic Identification System (AIS) data, we extracted features from the recorded data. Three feature sets are generated and compared using expert-knowledge features, methods for feature selection, and Denoising Autoencoder latent space representation. Behaviour modelling and analysis utilises clustered data from the three feature sets, employing Gaussian Mixture Models (GMM). The trained GMM models serve as digital shadows, enabling storage-efficient representation of vessel behaviour and facilitates applications such as online situational awareness and marine-traffic management. These digital shadows serve as observer-layer in a dynamic autonomous system of system, offering insights into maritime activities and enhancing navigation safety in busy waterways. This research contributes to the advancement of autonomous navigation systems and supports efficient maritime traffic management strategies. |
16:25 | Synthesizing Training Data for Intelligent Weed Control Systems Using Generative AI PRESENTER: Sourav Modak ABSTRACT. Deep Learning already plays a pivotal role in technical systems performing various crop protection tasks, including weed detection, disease diagnosis, and pest monitoring. However, the efficacy of such data-driven models heavily relies on large and high-quality datasets, which are often scarce and costly to acquire in agricultural contexts. To address the overarching challenge of data scarcity, augmentation techniques have emerged as a popular strategy to expand training data amount and variation. Traditional data augmentation methods, however, often fall short in reliably replicating real-world conditions and also lack diversity in the augmented images, hindering robust model training. In this paper, we introduce a novel methodology for synthetic image generation designed specifically for object detection tasks in the agricultural context of weed control. We propose a pipeline architecture for synthetic image generation that incorporates a foundation model called Segment Anything Model (SAM), which allows for zero-shot transfer to new domains, along with the recent generative AI-based Stable Diffusion Model. Our methodology aims to produce synthetic training images that accurately capture characteristic weed and background features while replicating the authentic style and variability inherent in real-world images with high fidelity. In view of the integration of our approach into intelligent technical systems, such a pipeline paves the way for continual selfimprovement of the perception modules when put into a self-reflection loop. First experiments on real weed image data from a current research project reveal our method’s capability to reconstruct the innate features of real-world weed infested scenes from an outdoor experimental setting. |
16:50 | PRESENTER: Glen te Hofsté ABSTRACT. On-board Computers (OBC) are at the centre of space-faring systems. They provide computational performance to the system with high availability and dependability. However, these systems typically consist of expensive, slow, fault-tolerant hardware to cope with errors or failures during a mission. Commercial-off-the-shelf (COTS) components offer higher performance but do not provide the fault-tolerance mechanisms. The ScOSA (Scalable On-board Computing for Space Avionics) architecture uses COTS and rad-hard components as a distributed system, with the advantage of providing more computing performance than current OBCs while maintaining the dependability properties. ScOSA uses a middleware to manage the COTS components as a distributed system of nodes, which, in the event of a node failure, mitigates the effects by reconfiguring the system to a configuration that excludes the failed node using a pre-determined configuration. These configurations are computed offline and have an exponentially growing memory usage depending on the number of nodes in the system, which limits the system's scalability. This paper presents an online reconfiguration algorithm as a solution to this scalability problem. Upon the occurrence of a node failure event, the online algorithm makes scheduling decisions at run-time, eliminating the need for pre-determined configurations. A novel online scheduling mechanism, consisting of six phases, which includes a combination of fault-tolerance, parallelism, and the use of the real-time state of the system, is a step towards higher dependability in distributed on-board computing. The online reconfiguration is evaluated by comparing it to the offline reconfiguration in terms of time and network traffic, showing that it is not only capable of generating configurations dynamically but also provides a solution to the scalability problem. |
17:15 | An Organic Computing Approach for CARLA Simulator ABSTRACT. Autonomous vehicles are increasingly being equipped with a large number of Electronic Control Units. This development leads to an increasing complexity of the systems, which in turn increases the probability of failures and unforeseen errors. To address these challenges, this article presents the integration of the Artificial DNA (ADNA)-based Organic Computing approach into the CAR Learning to Act (CARLA) simulator. CARLA is a powerful tool for the automotive industry to explore autonomous driving in a cost-efficient way. It therefore offers an ideal environment for testing innovative solutions from the field of Organic Computing. The research objective is to implement and evaluate Organic Computing methods in a vehicle environment in order to increase the reliability of vehicle functions. Thanks to the ADNA-based Organic Computing approach, the self-* properties of vehicles are available, and their driving behaviour can be researched. The first experiments will be presented in which the vehicle is controlled both manually and autonomously entirely by ADNA-based Organic Computing. |
Social Event