DSD 2024: 2024 EUROMICRO DIGITAL SYSTEM DESIGN CONFERENCE
PROGRAM FOR THURSDAY, AUGUST 29TH
Days:
previous day
next day
all days

View: session overviewtalk overview

09:40-11:10 Session 5A: Hardware Acceleration Analysis and Scheduling
Chair:
Location: Room 109
09:40
Exploration of Custom Floating-Point Formats: A Systematic Approach

ABSTRACT. The remarkable advancements in AI algorithms over the past three decades have been paralleled by an exponential growth in their complexity, with parameter counts soaring from 60,000 in LeNet during the late 1980s to a staggering 175 billion in ChatGPT 3.0. To mitigate this surge in memory footprint, approximate computing has emerged as a promising strategy, focusing on deploying the minimal resolution necessary to maintain acceptable accuracy. Yet, current practices are hindered by two major challenges: a) the process of identifying the optimal resolution and representation format for each tensor remains a manual, ad hoc task, and b) the representation, typically in floating point (FP) format, is confined to standardized norms predominantly supported by commercial-off-the-shelf (COTS) products like GPUs. This paper tackles these issues by introducing a systematic approach to exploring the FP representation design space to find the ideal FP format for each tensor, thereby leveraging the full potential of FP quantization techniques. It is designed for custom hardware, enabling access to arbitrary FP formats, but also allows users to limit their exploration to standard FP formats, making it compatible with COTS. Additionally, the proposed method explores the Block Floating-Point (BFP) and automatically decides on the size of the blocks. A heuristic-based search method is proposed to handle the large design space. The proposed approach is general, and the heuristic is not biased towards any specific category of algorithms. We apply this method to a Self-Organizing Map (SOM) for bacterial genome identification and LeNet-5 neural network, demonstrating a significant reduction in memory footprint by around 94% and 96%, respectively, compared to the conventional 32-bit FP baseline.

10:10
Hardware-level Access Control and Scheduling of Shared Hardware Accelerators

ABSTRACT. With the trend to consolidate hardware on a single platform, FPGA virtualization plays an increasingly important role in the embedded domain. FPGA virtualization allows multiple software tasks or even guest operating systems to share reconfigurable resources. However, state-of-the-art approaches assign each hardware accelerator to a single software task for a fixed duration. This becomes a problem when the number of hardware accelerators required by software tasks concurrently exceeds the FPGA area. If several software tasks request to accelerate the same functionality, accelerators can be shared. Embedded reconfigurable systems face the challenge of a uniform address space. When several tasks use a memory-mapped communication interface that allows to directly access the accelerator’s address space, access control and the protection from unauthorized access must be ensured. Existing software-based approaches lead to high latencies. Thus, we propose a hardware-level scheduler that schedules hardware tasks in spatial and temporal respect. The allocation to a hardware accelerator is combined with the assignment of access rights. Any unauthorized access leads to a page fault. When hardware tasks share an accelerator, they are scheduled according to the Earliest Deadline First (EDF) policy. Buffers ensure data isolation. Compared to hardware task scheduling in software, a performance increase of 7.02 times is reached.

10:40
Achieving Flexible Performance Isolation on the AMD Xilinx Zynq UltraScale+

ABSTRACT. Co-hosting different tasks on the same MPSoC contributes to increasing average performance by allowing them to share MPSoC’s resources that, otherwise, could be underutilized. However, resource sharing challenges performance isolation among tasks, as required in time-sensitive embedded critical systems like automotive and avionics. On the other hand, resource isolation through segregation (the reference solution for preventing the propagation of time-related safety issues) is detrimental to average performance. In this work, we show that the built-in QoS support in modern MPSoCs can be smartly leveraged to adapt to the timing and performance requirements of the running applications. In particular, we develop specific configurations of the complex QoS support in the Zynq UltraScale+ MPSoC that deliver performance isolation for time-sensitive tasks (TSTs) and ensure that non-time-sensitive tasks (NTSTs) maximize their average performance by exploiting the resources not used by TSTs. Our results show that the TSTs with the most stringent safety constraints achieve high degrees of isolation, 96.0% of their solo performance on average, while NTSTs exploit the resources not used by the TSTs achieving performance ranging from 72% to 4% depending on the resource left by TSTs.

10:55
Partial Reconfiguration for Energy-Efficient Inference on FPGA: A Case Study with ResNet-18

ABSTRACT. Efficient acceleration of deep convolutional neural networks is currently a major focus in Edge Computing research. This paper presents a realistic case study on ResNet-18, exploring Partial Reconfiguration (PR) as an alternative to the standard static reconfigurable approach. The PR strategy is based on sequencing the layers of the DNN on a single reconfigurable region to significantly reduce the amount of Programmable Logic (PL) resources required. Results demonstrate that PR-based acceleration can reduce FPGA resource usage by over 6 times, power consumption by 3.2 times, and the corresponding global energy cost by 2.7 times, with only a 17.5% increase in execution time. This approach shows great potential for further reductions in area and power consumption.

09:40-11:10 Session 5B: Hardware, Software, and Tools for the IoT-to-Edge-to-Cloud Continuum
Location: Room 106
09:40
INVITED PAPER - Leveraging Reusable Code and Proofs to Design Complex DRAM Controllers -- A Case Study

ABSTRACT. Critical real-time systems are getting more and more complex and require ever more computing power. Multi-core platforms, GPUs, and custom accelerators promise to deliver this needed performance. However, these platforms are notoriously hard to analyze and lack predictability in terms of timing properties. Computer architectures and platforms that offer both predictability and performance are thus needed.

This work investigates the use of the interactive proof assistant Coq in order to model complex DRAM memory controllers (MCs) for multi-core platforms. The design of predictable high-performance MCs is particularly challenging, since memory requests have to be processed efficiently, while facing interference from other cores in the system. The problem is exacerbated by the complexity of DRAM devices and the various timing constraints they impose. Specifically, this work extends a previous Coq framework by focusing on reusability, which allows designers to develop and prove complex MCs. As a use-case, we present TDMShelve, an MC balancing performance and isolation.

10:02
SmartDMA: Adaptable Memory Access Controller for CGRA-based Processing Systems

ABSTRACT. Modern computing platforms exploiting Coarse-Grained Reconfigurable Array architectures depend highly on the efficiency with which data is handled inside the architecture. Whether in compute-intensive systems, data moving is critical to maximizing energy efficiency and reducing latency. Access to the main memory is the most costly operation; therefore, the data retrieved must be kept near the processing elements of the architecture as long as possible to reduce data transfers. Modern algorithms involve very different access patterns to the main memory, requiring high versatility for Direct Memory Access (DMA) mechanisms. This work presents the SmartDMA architecture, a RISC-V-based programmable DMA controller specifically designed to perform adaptable memory access patterns and implement proper data reuse policies in CGRA-based systems. It comprises a set of Data Mover Engines (DMEs) that implement configurable 1D, 2D, and 3D data movements. Exploiting a custom RISC-V ISA extension and a programmable event network, the application-specific firmware loaded on the SmartDMA can schedule DMA commands among all DMEs, ensuring they are always busy with data transactions. We show a typical use case that exploits CGRA-based processing and highlights the functionality of the SmartDMA. We synthesized the SmartDMA on TSMC 40nm low-power standard-cell technology for three architectural configurations at 200 MHz with a maximum memory throughput of 3.2 GB/s: small, medium, and large. The small configuration occupies 45k um2 of cell area and consumes 5.0 mW. The medium occupies 120k um2 and consumes 13.5 mW. The large one occupies 425k um2 and consumes 43.2 mW.

10:24
Flexible Precision Vector Extension for Energy Efficient Coarse-Grained Reconfigurable Array AI-Engine

ABSTRACT. The rapid development of Artificial Intelligence (AI) algorithms has created a need for a resource optimised hardware accelerator. Among various platforms, Coarse-Grained Reconfigurable Array (CGRA) have gained importance as on-edge accelerators. They comprise of heterogeneous Processing Element (PE) matrix, which allows for high flexibility and parallelisation of calculations. They are mainly used for speeding up Data Flow Graph (DFG) execution.

We aim to provide a general purpose, highly parameterised and flexible architecture for AI on-edge data crunching. We propose a CGRA with a vector extension which allows for dynamically adjustable precision of calculation while maintaining a desired performance-power-area optimisation. It targets 4 bits integer (INT4) and 8 bits integer (INT8) quantization for fast and efficient Neural Networks (NN) processing. In this paper, we examined hardware costs required to support the vector extension functionality. We synthesised the design on the 40nm Standard-Cell technology from TSMC. The obtained results show that the proposed extension attains average 28.2% decrease in power consumption and 21.6% decrease in area compared to a reference design of the same computation power.

10:39
AUTOSAR AP and ROS 2 Collaboration Framework

ABSTRACT. The field of autonomous vehicle research is advancing rapidly, necessitating platforms that meet real-time performance, safety, and security requirements for practical deployment. AUTOSAR Adaptive Platform (AUTOSAR AP) is widely adopted in development to meet these criteria; however, licensing constraints and tool implementation challenges limit its use in research. Conversely, Robot Operating System 2 (ROS 2) is predominantly used in research within the autonomous driving domain, leading to a disparity between research and development platforms that hinders swift commercialization. This paper proposes a collaboration framework that enables AUTOSAR AP and ROS 2 to communicate with each other using a Data Distribution Service for Real-Time Systems (DDS). In contrast, AUTOSAR AP uses Scalable service-Oriented Middleware over IP (SOME/IP) for communication. The proposed framework bridges these protocol differences, ensuring seamless interaction between the two platforms. We validate the functionality and performance of our bridge converter through empirical analysis, demonstrating its efficiency in conversion time and ease of integration with ROS 2 tools. Furthermore, the availability of the proposed collaboration framework is improved by automatically generating a configuration file for the proposed bridge converter.

10:54
Multiprotocol Middleware Translator for IoT

ABSTRACT. The increasing number of IoT deployment scenarios and applications fostered the development of a multitude of specially crafted communication solutions, several proprietary, which are erecting barriers to IoT interoperability, impairing their pervasiveness. To address such problems, several middleware solutions exist to standardize IoT communications, hence promoting and facilitating interoperability. Although being increasingly adopted in most IoT systems, it became clear that there was no “one size fits all” solution that could address the multiple Quality-of-Service heterogeneous IoT systems may impose. Consequently, we witness new interoperability challenges regarding the usage of diverse middlewares. In this work, we address this issue by proposing a novel architecture – the polyglIoT, that can effectively interconnect diverse middleware solutions while considering the QoS requirements alongside the proposed translation. We analyze the performance and robustness of the solution and show that such Multiprotocol Translator is feasible and can achieve a high performance , thus becoming a fundamental piece to enable future highly heterogeneous IoT systems of systems.

09:40-11:10 Session 5C: Opensource Methods, architectures, tools and technologies for RISC-V
Location: Room 116
09:40
Seal5: Semi-automated LLVM Support for RISC-V ISA Extensions Including Autovectorization

ABSTRACT. The RISC-V instruction set architecture (ISA) is popular for its extensibility, allowing easy integration of custom vendor-defined instructions tailored to specific applications. However, a quick exploration of instruction candidates fails due to the lack of tools to auto-generate embedded software toolchain support. In particular, exploiting SIMD instructions to accelerate typical DSP and machine learning workloads needs specialized integration. This work establishes a semi-automated flow to generate LLVM compiler support for custom instructions based on a C-style ISA description language. The implemented Seal5 tool is capable of generating support for functionalities ranging from baseline assembler-level support, over builtin functions to compiler code generation patterns for scalar as well as vector instructions, while requiring no deeper compiler know-how. This paper focuses primarily on a novel pattern generator approach for the optimized code generation for SIMD instructions, including support for autovectorization. The autogenerated LLVM toolchain reduces development times drastically while performing similarly or better compared to the existing, manually implemented Core-V reference LLVM toolchain on a wide variety of benchmarks. Seal5 further allows the addition of compiler code generation support for the Core-V SIMD instructions, which is not yet available in the reference toolchain. Additionally, Seal5 facilitates a quick exploration of custom instruction candidates as demonstrated for a cryptography extension.

10:10
Coordinating the fetch and issue warp schedulers to increase the timing predictability of GPUs

ABSTRACT. The verification of a time-critical system requires precise analysis of execution times, which makes assumptions on the system's behaviour, especially when it is poorly documented. One of those assumptions, when the system features a GPU accelerator, relates to the policy that each Streaming Multiprocessor (SM) follows to schedule warps. We argue that the literature overlooks the lack of synchronization between the instruction fetch and instruction issue schedulers, while this is likely to make the behavior of existing GPUs unpredictable. We propose to coordinate the action of the fetch and issue stages in GPU pipelines in order to enable reliable static timing analysis. We implement our approach in Vortex, a RISC-V-based open-source GPU. We report experiments that show that it makes warp scheduling predictable with little performance costs.

10:40
A Suite of Processors to Explore CHERI-RISC-V Microarchitecture

ABSTRACT. We present the implementation of Capability Hard- ware Enhanced RISC Instructions (CHERI) secure capabilities for RISC-V microarchitectures. This includes implementations for three different scales of core, including microcontrollers and the first open application of CHERI to a superscalar processor, and investigate the scaling of CHERI extensions across a range of core complexities. CHERI offers a contemporary cross-architecture description of capabilities. The initial CHERI implementation extended a MIPS processor. Based on its success in this context, we investigate the microarchitectural implications across a wider range of processors. To improve adoption, this work is performed on the more contemporary RISC-V architecture. We first extend the Piccolo and Flute microcontrollers, and evaluate area and frequency implications on FPGA, and an initial performance evaluation. To validate correctness, the processors are integrated into the TestRIG infrastructure. We then extend RiscyOO for the first open instantiation of CHERI for a superscalar out-of-order application-class core. We explore new questions due to the more sophisticated microarchitecture, and highlight more architectural tradeoffs. Again, the processor is evaluated on FPGA, investigating area, frequency, and per- formance. We then are able to present the first analysis of the scaling of CHERI overheads with core complexity. Based on these results, the ratification of CHERI RISC-V is now underway.

10:55
Accelerating Galois Field Arithmetic based Cryptographic Algorithms on RISC-V cores using Carryless Multiplication - WiP

ABSTRACT. Edge computing emerges as a critical paradigm in the wake of the Internet of Things (IoT) and 5G New Radio (5GNR). It catalyzes the demand for energy-efficient devices that have resilient CPUs with lean physical footprints. Mitigating the security challenges in these networked devices necessitates Bit Manipulation Instruction (BMI) inclusive architectures to improve Galois Field (GF) arithmetic, which is a fundamental step for most cryptographic algorithms. All major Instruction Set Architectures (ISA), including RISC-V incorporate dedicated instructions for carryless multiplication, recognizing its significant contribution in cryptographic applications. Acknowledging this fact, this paper introduces a novel approach to enhance the performance of GF arithmetic using carryless multiplication. The approach presents a promising avenue by improving the execution cycle counts of a real-world cryptographic application like the Advanced Encryption Scheme (AES) and can be scaled to all GF-based cryptographic algorithms. The proposed GF algorithm effectively maps the Carryless Multiplication Instruction of the ratified RISC-V Zbc extension. Evaluations indicate about 4.5x performance improvement for multiple schemes of AES using an open-source RISC-V core (SweRV-EL2TM 1.3) without incurring any additional overhead in terms of area as well as compiler support.

11:25-12:55 Session 6A: Security and monitoring of hardware devices
Location: Room 109
11:25
Circuit Disguise: Detecting Malicious Circuits in Cloud FPGAs without IP Disclosure

ABSTRACT. At present, the state-of-the-art cloud FPGA deployment process does not allow the cloud provider to perform design checks for malicious circuits unless the clients' designs are available in an unprotected form. In this paper, we introduce the circuit disguise method that allows the design checks to be performed without disclosing the clients' Intellectual Property (IP). The method is based on a lossy circuit transformation that generates a compressed version of the netlist specifying the client's design. While the design checks can still be performed on the compressed version of the netlist, reversing the transformation to recover the original design is not possible. The circuit disguise method can be used in combination with bitstream encryption. This enables the clients to protect not only the designs but also the bitstreams. Furthermore, with circuit disguise, new design checks can be performed on designs that are already compiled into a protected bitstream. We present an implementation of the circuit disguise method and demonstrate its effectiveness with various benign and malicious benchmark designs. The implementation is publicly available.

11:55
Scripting the Unpredictable: Automate Fault Injection in RTL Simulation for Vulnerability Assessment

ABSTRACT. This paper presents FISSA, an open-source software tool that facilitates the building of fault injection campaigns based on well-known HDL simulation tools. The proposed solution relies on two software modules that encapsulate an existing HDL simulator. The first module generates TCL scripts to drive the simulation process and automatically inject faults according to the user's needs. The second module is dedicated to fault analysis, enabling users to assess the resilience of their design. The proposed approach allows the designers to seamlessly integrate fault injection simulations into their workflow. To demonstrate the solution's capacity, this paper proposes a case study to evaluate the robustness of a Dynamic Information Flow Tracking mechanism integrated into a RISC-V processor against different fault injection scenarios. For that purpose, a total of 360,747 simulations have been performed.

12:10
Dynamic Frequency Boosting of RISC-V FPSoCs Through Monitoring Runtime Path Activations

ABSTRACT. In this paper, we explore the path activation variability within an FPGA instantiated System-on-Chip (FPSoC) Rocket RISC-V and evaluate the potential improvement in performance under dynamic frequency boosting. We define an analytical performance model and we extend RISC-V with a lightweight activation monitoring framework to enable in-depth analysis. Through an extensive experimental campaign, we explore the timing paths data-depended activation behavior for realistic workloads, i.e. benchmarks and OS-commands, executed on Linux-based FPGA microprocessor. We show promising frequency boosting margins due to rare activations of the most critical paths delays, leading to potential performance improvements between 18% and 42% including the switching frequency overhead.

12:25
External Memory Protection on FPGA-Based Embedded Systems

ABSTRACT. With the proliferation and increased capabilities of embedded systems, they become more exposed and easier targets to attacks. This includes the external components such as DRAM, particularly vulnerable to attacks, especially regarding unauthorized access to the stored data. With the goal of increasing storage security, this paper proposes a memory bridge evaluation platform and an improvement to the authenticated-encryption of off-chip memory, minimizing critical memory access latency. Most state-of-the-art works either use custom high-latency solutions, or frequently recur exclusively to AES-GCM standard for Authenticated Encryption with Associated Data (AEAD). Besides AES-GCM, this work explores and implements different protection solutions, including NOEKEON-GCM and AEGIS-128L. This work also explores how much of the algorithms can be pre-computed between memory transmissions, by removing the address and data from critical path computations. The presented prototypes were evaluated on a Xilinx Zynq-7100. The experimental results suggest that the developed platform allow to assess and select different approaches while suggesting a 56% improvement in memory access latency over equivalent AES-GCM designs due to exploration of pipeline and parallelization options.

12:40
Event Monitor Validation in High-Integrity Systems

ABSTRACT. Platforms for modern embedded systems equip an increasing number of high-performance features to provide the required levels of performance. Timing analysis solutions handle the complexity of these platforms by relying on hardware event monitors (HEMs) that provide insightful information about resource utilization and, hence, contention among tasks. As a result, HEMs have become a key element to warrant a safe timing behavior of a system, for which reason they must be validated. While some initial works target HEMs validation, they consider one HEM at a time and focus on those HEMs for which an expert can establish an expected value for relatively small code snippets. In this paper, we propose a methodology for the validation of those HEMs for which a specific expected value cannot be established a priori even for simple cases and, instead, needs to be validated in conjunction with other HEMs. Our method also deals with the natural variability of the HEMs' values in high-performance platforms when collected in different experiments. We illustrate the effectiveness of our proposed technique for validating HEMs related to cache coherence in a relevant platform in the avionics domain.

11:25-12:55 Session 6B: Safety, Security and Privacy of Cyber-Physical Systems, and Computations at the Edge
Location: Room 106
11:25
Automated Polyhedron-based TDMA Schedule Design for Predictable Mixed-Criticality MPSoCs

ABSTRACT. The ongoing trend of ECU consolidation and the resulting integration of previously-distributed functionality in central car servers leads to mixed-criticality systems on these platforms. Multiprocessor system-on-chip architectures provide a powerful basis for the execution of critical and non-critical tasks. A challenge arises in the prevention of contention while accessing shared resources, as well as determining and minimizing the expected interference. A deadline-bound execution inside these systems is constrained by this interference affecting the executed tasks. Deadline-bound task completion is often facilitated by using hypervisors and real time operating systems, providing a runtime environment to enable the completion of critical tasks before their deadline in mixed-criticality systems. These runtime environments frequently employ TDMA for a guaranteed and certifiable execution. In this work, we introduce an algorithm creating schedule tables for TDMA schedulers supporting isolation mechanisms. Using a constructive approach, the resulting scheduling tables support mechanisms for parallel access to shared resources and windows for exclusive execution. Our approach is validated by generating synthetic scheduling tables, providing a high success rate in all cases for a processor utilization of up to 80 % and a maximum runtime of 10 seconds.

11:55
DA-CGRA: Domain-Aware Heterogeneous Coarse-Grained Reconfigurable Architecture for the Edge

ABSTRACT. Coarse-Grained Reconfigurable Architectures (CGRAs) are one of the promising solutions to be employed in power-hungry edge devices owing to providing a good balance between reconfigurability, performance and energy-efficiency. Most of the proposed CGRAs feature a homogeneous set of processing elements (PEs) which all support the same set of operations. Homogeneous PEs can lead to high unwanted power consumption. As application benchmarks utilize different operations irregularly, heterogeneous PE design is a powerful approach to reduce power consumption of CGRA. In this paper, we propose DA-CGRA, a domain-aware CGRA tailored to signal processing applications. To extract heterogeneous architecture, first, a set of signal processing applications has been profiled to derive the requirements of the applications in terms of type of operations, number of operations and memory usage. Then, domain-specific PEs are designed using Verilog RTL based on the profiling results. We have selected spatio-temporal or spatial execution model based on the application features to increase the overall performance and efficiency. Experimental results demonstrate DA-CGRA outperforms FLEX and RipTide state-of-the-art CGRAs in terms of energy-efficiency by 23% and 38%, respectively. Moreover, DA-CGRA can achieve 3.2× performance improvement over HM-HyCUBE.

12:10
Efficient Edge AI: Deploying Convolutional Neural Networks on FPGA with the Gemmini Accelerator

ABSTRACT. The growing concerns regarding energy consumption and privacy have prompted the development of AI solutions deployable on the edge, circumventing the substantial CO2 emissions associated with cloud servers and mitigating risks related to sharing sensitive data. But deploying Convolutional Neural Networks (CNNs) on non-off-the-shelf edge devices remains a complex and labor-intensive task. In this paper, we present and end-to-end workflow for deployment of CNNs on Field Programmable Gate Arrays (FPGAs) using the Gemmini accelerator, which we modified for efficient implementation on FPGAs. We describe how we leverage the use of open source software on each optimization step of the deployment process, the customizations we added to them and its impact on the final system's performance. We were able to achieve real-time performance by deploying a YOLOv7 model on a Xilinx ZCU102 FPGA with an energy efficiency of 36.5 GOP/s/W. Our FPGA-based solution demonstrates superior power efficiency compared with other embedded hardware devices, and even outperforms other FPGA reference implementations. Finally, we present how this kind of solution can be integrated into a wider system, by testing our proposed platform in a traffic monitoring scenario.

12:25
PRIV-DRIVE: Privacy-Ensured Federated Learning using Homomorphic Encryption for Driver Fatigue Detection

ABSTRACT. Detecting fatigue in drivers has become increasingly important for safe driving, especially with the use of more smart devices and Internet-connected vehicles. While sharing data between vehicles can enhance fatigue detection systems, privacy concerns pose significant barriers to this sharing process. We propose a Federated Learning (FL) approach for monitoring fatigue-driven behavior to address these challenges. However, there is a concern that the drivers' private information might be leaked in the FL system. In this paper, we introduce PRIV-DRIVE, a novel approach for privacy-enhanced fatigue detection applications. Our method integrates Paillier homomorphic encryption (PHE) with a top-k parameter selection technique, bolstering privacy and confidentiality in federated fatigue detection systems. This approach reduces communication and computation overhead while ensuring model accuracy. To the best of our knowledge, this is the first paper to implement PHE in FL setups for fatigue detection applications. We ran several experiments and evaluated the PRIV-DRIVE method. The results show substantial efficiency gains with different HE key sizes, reducing computation time by up to 99\% and communication traffic by up to 95\%. Importantly, these improvements have minimal impact on accuracy, effectively meeting the requirements of fatigue detection applications.

11:25-12:55 Session 6C: European Projects in Digital Systems Design – 1
Location: Room 116
11:25
6G-TakeOff: Holistic 3D Networks for 6G Wireless Communications

ABSTRACT. The unified 3D communication networks, integrating standard terrestrial mobile communication networks and non-terrestrial networks (NTNs), are seen as the key enabler for global connectivity in the next generation (6G) wireless communications. To achieve this goal, new technologies and components are needed in order to meet the requirements for the 6G networks in terms of higher data rates, enhanced reliability and security, and network reconfigurability. This work introduces the German project 6G-TakeOff, aimed at the design of solutions for unified 3D networks for 6G wireless communication systems. The project consortium brings together academic and industrial partners from Germany and Europe, covering the entire value chain from design of electronics to applications. This work presents the key hardware components required for 3D networks and the concept for demonstration of their functionality.

11:55
3D Decision Support Tool for Brain Tumour Surgery: The STRATUM Project

ABSTRACT. Integrated digital diagnostics can support complex surgical procedures in many anatomical sites, brain tumour surgery being the most complex. STRATUM is a 5 year Horizon Europe funded project with the goal of developing an innovative 3D decision support tool for brain tumour surgeries, based on real-time multimodal data processing using artificial intelligence algorithms. The proposed tool is envisioned as an energy-efficient Point-of-Care computing system to be integrated within neurosurgical workflows to aid surgeons to make informed, efficient, and accurate decisions during surgical procedures. The expected long-term impact of STRATUM is to reduce the duration of surgical procedures, thus decreasing patients’ risks, but also optimising the resources of European health care systems.

12:25
REBECCA: Reconfigurable Heterogeneous Highly Parallel Processing Platform for safe and secure AI

ABSTRACT. Modern solutions in all computing domains are starting to heavily adopt AI/ML components to unlock advanced capabilities and functionalities not possible before. This revolution places a great burden on the compute devices used, as performance requirements rise sharply, leading to the deployment of all but the most simple AI components in the cloud. There are, however, numerous application domains (e.g. automotive and healthcare) that this cloud-based AI approach is not feasible or reliable enough and for this reason multiple solutions that enable AI-related computations at the edge are starting to emerge.

The REBECCA project aims to develop a novel system-on-chip for such edge AI systems. REBECCA will integrate multiple RISC-V general purpose CPU cores along with advanced AI/ML and security accelerator engines in a single package comprised of two chiplets and the capability to tightly interconnect reconfigurable devices. The main goal is to provide an edge AI solution with significantly higher levels of performance, energy/power efficiency and security compared to existing systems. Furthermore, by supporting the use of reconfigurable hardware, REBECCA will be able to adapt to specific application requirements and extend its functional capabilities without sacrificing the energy efficiency levels that it targets. Besides hardware components, REBECCA will also develop the required software support in terms of operating systems, hypervisors and libraries to enable the full potential of its hardware platform.

15:00-16:30 Session 7: Industrial Session - Auditorium
15:00
NanoXplore : European Leader in FPGA and SoC FPGA

ABSTRACT. NanoXplore is the main European SoC FPGA manufacturer, creating best in class solutions to support harsh conditions for Space equipment, but also for space constellation and secured defense applications. With its sovereign European supply chain and Rad-Hard highly performant components, NanoXplore is the perfect answer for a large quantity of projects. After heading to the stars with the NG-MEDIUM, NanoXplore is also developing a wide array of offers for all types of needs. This presentation will focus on these highly innovative products, its in-house Design Suite Impulse, as well as the roadmap for the incoming products.

15:30
Embedded systems development, an example of implementation in a medical environment

ABSTRACT. BodyCAP was created in 2012 in Caen, Normandy, as a result of the encounter between research into human physiology and microelectronics. At that time, it was realized that major advances in the miniaturization of electronic sensors and wireless communication protocols were not being fully exploited for monitoring people's state of health. The company has therefore focused on the development of miniaturized sensors to facilitate the monitoring of people's physiological condition.

The applications targeted are not only in the healthcare sector, but also in research, high-level sports and the military. Throughout its development process, the company has relied on its network of independent international experts to scientifically validate the use and effectiveness of its solutions. BodyCAP's flagship product is an ingestible electronic capsule that measures core temperature via the gastrointestinal tract and transmits it by radio wave to a receiver, which collects the data before making it available on a smartphone or computer.

16:00
How to track and predict design resources for complex chip design projects by including jointly cost and sustainability criteria

ABSTRACT. INNOVA will present an innovative approach based on its unique software platform to help reducing design cost by including sustainability criteria. Design cost is reduced through AI-based prediction and tracking capability which are open to a wide category of users and skills.