DSD 2024: 2024 EUROMICRO DIGITAL SYSTEM DESIGN CONFERENCE
PROGRAM FOR FRIDAY, AUGUST 30TH
Days:
previous day
all days

View: session overviewtalk overview

09:40-11:10 Session 8A: Hyperspectral Imaging Applications, Algorithms and Architectures
Location: Room 109
09:40
Assessment of the performance of a commercial spectral sensor for portable and cost-effective multispectral applications

ABSTRACT. This study evaluates the AS7265x spectral sensor board versus a reference spectrometer, focusing on performance, reflectance measurements, and colour determination. The experimental setup uses a standard ColorChecker chart and skin samples to represent various multispectral applications. The AS7265x demonstrates outstanding system specifications, including superior signal-to-noise ratio (SNR) compared to the benchtop spectrometer. Reflectance spectra comparisons reveal a high level of agreement, with an RMSE below 0.1 in the spectral range of 410-900 nm. Additionally, a bias correction model, derived from the differences between the spectra of both systems, has been applied to the AS7265x data. This correction has significantly improved the RMSE and GFC metrics, which assess spectral discrepancies. Spectra are also analysed in terms of colour using the CIEDE2000 standard, showing notable improvements after the bias correction.

10:02
Assessing Processing Strategies on Data from Medical Hyperspectral Acquisition Systems

ABSTRACT. Hyperspectral imaging (HSI) has gained prominence in medical diagnostics due to its ability to capture and analyse detailed spectral information beyond human visual capabilities. Processing of HSI data is essential to enhance subsequent analysis and ensure the accuracy of results by reducing noise and unwanted artifacts. This paper provides an overview of state-of-the-art processing methods for HSI data, focusing on smoothing, normalization, and spectral derivatives. The efficacy of these methods is evaluated using root mean square error (RMSE) to compare pre-processed data with wavelength reference standard, alongside execution time considerations. Results indicate that certain algorithms, such as smoothing based on moving average, standard normal variate, and first spectral derivatives, yield superior performance across different medical HSI systems. Additionally, combining these processing techniques further improves data fidelity to the wavelength reference standard. Overall, this study offers insights into optimal processing strategies for enhancing the accuracy and reliability of HSI data.

10:24
Inter-band Movement Compensation Method for Hyperspectral Images based on Spectral Scanning Technology

ABSTRACT. Hyperspectral Imaging (HSI) is a novel non-invasive, label-free technique for rapid medical diagnosis, capturing a wide range of the electromagnetic spectrum in numerous narrow spectral bands. This comprehensive data acquisition poses challenges in clinical settings due to patient movement (e.g., breathing, tremors, heartbeat). This work presents a method to compensate inter-band movements during HSI acquisition using spectral scanning acquisition systems. The system used in this work is based on a Liquid Crystal Tunable Filter (LCTF), featuring a Kurios VB1 filter (Thorlabs, USA) and a CS135 MUN monochrome camera, capturing the visible range (420-730 nm) in successive frames at different wavelengths.

The proposed method utilizes iterative image registration to align each frame with the initial reference frame, correcting for movements along the x-axis, y-axis, and rotational shifts. The system's efficacy was validated in virtually deformed HS cubes and in a custom motion platform. Results show improved alignment and reduced motion artifacts in the resulting HS images, highlighting potential of our method to enhance the reliability of HSI in clinical applications.

10:46
HS2RGB: an Encoder Approach to Transform Hyper-Spectral Images to Enriched RGB Images

ABSTRACT. Hyperspectral imaging (HSI) captures detailed spectral information across numerous wavelengths, providing superior object characterization to conventional RGB imaging. Despite these advantages, training deep learning models on HSI data is challenging due to the limited availability of extensive datasets, unlike the more familiar RGB images. To address this issue, we propose an encoder model that transforms hyperspectral images into enriched RGB images. These new enriched images represent a graphical depiction of HSI and become a new dataset to use as input for well-known models pre-trained on RGB images. In this work, we introduce HS2RGB, an encoder model based on the Vision Transformer (ViT) architecture, which condenses hyperspectral data into a three-element vector interpreted as RGB channels. The results demonstrate the effectiveness of the new images generated by the encoder, showing better visual differentiation of features compared to traditional RGB images and greater consistency in latent vectors of the same type of material across different samples compared to images generated with feature selection and transformation techniques like PCA and t-SNE. Finally, we tested the enriched RGB images using Meta's SAM model for instance segmentation, revealing that our model's images provided more precise identification of regions of interest, such as tumours in medical images.

09:40-11:10 Session 8B: Hardware and Resource Aware Digital AI
Location: Room 106
09:40
Optimizing Data Compression: Enhanced Golomb-Rice Encoding with Parallel Decoding Strategies for TinyML Models

ABSTRACT. Deep Neural Networks (DNNs) offer possibilities for tackling practical challenges and broadening the scope of Artificial Intelligence (AI) applications. The demanding memory requirements of present-day neural networks can be attributed to the rising intricacy of network architectures. These designs encompass multiple layers with an extensive number of parameters, leading to heightened demands on memory storage. The energy consumption during the inference execution of DNNs is predominantly attributed to the access and processing of these parameters. To tackle the significant size of models integrated into Internet of Things (IoT) devices, a promising strategy involves diminishing the bit width of weights. This paper introduces an improved version of Golomb-Rice (GR) encoder and an optimized Parallel Golomb-Rice decoder that can support sparse and non-sparse DNNs. To evaluate the encoder's and decoder's efficiency, we conducted two sets of experiments using three TinyML benchmarks, one without pruning and the other incorporating pruning. The results highlight that the encoder demonstrates a Compression-Ratio (CR) superior to that of Huffman encoding, and the decoder exhibits an energy efficiency of up to 2.6 TBps/W and 2.7 TBps/W for four- and eight-weight decoding, respectively.

10:10
LeQC-AT: Learning Quantization Configurations during Adversarial Training for Robust Deep Neural Networks

ABSTRACT. Due to the high feature learning capability of Deep Neural Networks (DNNs), they are widely used in state-of-the-art machine learning tasks such as image and text recognition and natural language processing. However, in the recent developments of deep learning models, it has been observed that specially crafted inputs (adversarial samples) can deceive DNNs and result in incorrect predictions with high confidence. Such incorrect predictions by adversarially attacked DNNs can have devastating results in safety-critical applications. This situation can be further exacerbated in quantized DNNs, which employ reduced precision numbers to reduce the overall computational complexity of DNNs. To this end, this work proposes a framework for the joint optimization of DNNs to improve the natural accuracy of quantized DNNs and increase the robustness of the quantized models against adversarial attacks. In particular, the proposed framework employs quantization step size aware adversarial training of DNNs. Our proposed framework is generic and can be utilized with any quantization scheme that allows learning of quantization configurations during training. Furthermore, we present a novel loss function for adversarial training to improve the quantized networks’ accuracy. For example, our 3-bit quantized adversarial training of ResNet-18, ResNet-34, and WideResNet shows up to 21.63%, 24.49%, and 15.08% higher inference accuracy with attacked data, respectively, when compared to 3-bit vanilla quantized adversarial training on benchmark datasets.

10:40
High Throughput and Low Bandwidth Demand: Accelerating CNN Inference Block-by-block on FPGAs

ABSTRACT. A multitude of accelerators have been designed to accelerate the inference of widely-used Convolutional Neural Networks (CNNs). They can primarily be classified into two architectures: the Overlay architecture, which accelerates layer by layer, and the Dataflow architecture, which accelerates the entire model. Each has its own strengths and weaknesses, and we will propose a novel architecture to capitalize on their strengths and mitigate their weaknesses. Our architecture allows optimization for different target network structures. Internally, it features multiple PE Arrays, and during runtime, it can flexibly switch between different interconnection modes among these PE Arrays: the serial execution mode, accelerating an entire block at once, and the parallel execution mode, accelerating a single layer at a time. Its minimum area requirement is close to that of typical Overlay architecture accelerators, while its throughput per unit area far exceeds that of existing state-of-the-art accelerators. When executing the widely-used 8-bit quantized MobileNetV2, we observed an exceptionally high throughput of 2331 FPS (frames per second) on the mid-range FPGA ZU7EV, while requiring only 2.89 GiB/s of off-chip memory bandwidth. Running other networks also demonstrated high efficiency and reasonable off-chip memory bandwidth requirements.

10:55
HW/SW Collaborative Techniques for Accelerating TinyML Inference Time at No Cost

ABSTRACT. With the unprecedented boom in TinyML development, optimizing Artificial Intelligence (AI) inference on resource-constrained microcontrollers (MCUs) is of paramount importance. Most of the existing works focus on peak memory or computation reduction. The tasks are partitioned in the patch-based or device-based during the execution. However, it comes with a price of the latency and communication overhead. In this paper, we propose several techniques to accelerate the Convolutional Neural Networks (CNNs) inference process. These techniques are both architecture- and application-aware. From the application perspective, 1) we maximize computation reuse through instruction reordering, 2) fuse several linear layers together to improve computation patterns, and 3) enable memory reuse of intermediate buffers for improving memory behavior. From the architecture perspective, we propose techniques that take into account knowledge about underlying architecture of the MCU including 1) cache-aware and 2) multi-core parallelism- aware techniques. Those solutions only require the general MCUs features thus demonstrating board generalization across various networks and devices. These techniques come at no additional cost. It improve the inference latency without any compromise of the model accuracy or the model size. Our evaluation on a use-case from the health-care domain with real-data set for four CNNs – LeNet, AlexNet, ResNet20, and SqueezeNet – show that we achieve up to 71% reduction in inference latency

09:40-11:10 Session 8C: European Projects in Digital Systems Design – 2
Location: Room 116
09:40
Digital twins benefits and challenges from intelligent motion control point of view

ABSTRACT. This paper analyses the digital twins’ current status and future interests in the intelligent motion control domain based on the web survey conducted within a large European research project named IMOCO4.E. We aim to understand how the companies and institutions developing and utilizing digital twin technologies see the main benefits, challenges and next steps in their domain. We conclude, based on the survey, that digital twins are used, primarily in the design phase, to speed up the development of complex systems where utilizing physical prototypes is not feasible or even possible. The operational phase utilization is not yet seen as important, but it has great potential, especially when the development of digital twins has become more mature in relation to real-time interfacing, exact representation of real physical objects, and maintainability of implementation.

10:10
The TEXTAROSSA Project: Cool All the Way Down to the Hardware

ABSTRACT. The TEXTAROSSA project aims to bridge technology gaps to achieve performance and energy efficiency challenges that exascale computing systems will face in the near future. This project aims to provide solutions for improved energy efficiency and thermal control, seamless integration of heterogeneous accelerators in HPC multi-node platforms, and new arithmetic methods. These challenges are tackled through a co-design approach to heterogeneous HPC solutions, supported by the integration and extension of HW and SW IPs, programming models, and tools derived from European research.

10:40
Security Assessments for Networks and Services in 5G Networks

ABSTRACT. 5G networks, and their future iterations like 6G, are vital for the EU's digital transformation and global competitiveness. Given their role in connecting EU systems and critical infrastructures, addressing security, privacy, and trust challenges is crucial. Securing these networks requires high-quality technical solutions and operational collaboration among stakeholders. SAND5G research project that is funded by Digital Europe Programme (DIGITAL) under the Cybersecurity and Trust (DIGITAL-ECCC-2022-CYBER-B-03) call, aims to deliver a risk assessment platform to help 5G stakeholders secure their systems, enable national authorities to oversee security measures, and align with European cybersecurity policies and proposed EU toolbox for 5G security.

14:00-15:30 Session 9A: Vision, Image, Numbers and Functions
Location: Room 109
14:00
Event-based vision on FPGA - a survey

ABSTRACT. In recent years there has been a growing interest in event cameras, i.e. video sensors that record changes in illumination independently for each pixel. This type of operation ensures that acquisition is possible in very adverse lighting conditions, both in low light and high dynamic range, and reduces average power consumption. In addition, the independent operation of each pixel results in into low latency, which is desirable for robotic solutions. Nowadays, Field-Programmable Gate Arrays (FPGAs), along with general-purpose processors (GPPs/CPUs) and programmable graphics units (GPUs), are popular architectures for implementing and accelerating computing tasks. In particular, their usefulness in the embedded vision domain has been repeatedly demonstrated over the past 30 years , where they have enabled fast data processing (even in real-time) and energy efficiency. Hence, the combination of event cameras and reprogrammable device seems to be a good solution, especially in the context of energy-efficient real-time embedded systems. This paper gives an overview of the most important work where FPGAs have been used in different contexts to process event data. It covers applications in the following areas: filtering, stereovision, optical flow, acceleration of AI-based algorithms (including spiking neural networks) for object classification, detection and tracking, and applications in robotics and inspection systems. Current trends and challenges for such systems are also discussed.

14:30
An Energy-Efficient Artefact Detection Accelerator on FPGAs for Hyper-Spectral Satellite Imagery

ABSTRACT. Hyper-Spectral imaging (HSI) is a crucial technique used to analyse remote sensing data acquired from Earth observation satellites. The rich spatial and spectral information obtained through HSI allows for better characterisation and exploration of the Earth’s surface over traditional techniques like RGB and Multi-Spectral imaging on the downlinked image data at ground stations. In some cases, these images do not contain meaningful information due to the presence of clouds or other artefacts, limiting their usefulness. Transmission of such artefact HSI im- ages leads to wasteful use of already scarce energy and time costs required for communication. While detecting such artefacts prior to transmitting the HSI image is desirable, the computational complexity of these algorithms and the limited power budget on satellites (especially CubeSats) are key constraints. This paper presents an unsupervised learning-based convolutional autoencoder (CAE) model for artefact identification of acquired HSI images at the satellite and a deployment architecture on AMD’s Zynq Ultrascale FPGAs. The model is trained and tested on widely used HSI image datasets: Indian Pines, Salinas Valley, the University of Pavia and the Kennedy Space Center. For deployment, the model is quantised to 8-bit precision, fine-tuned using the Vitis-AI framework and integrated as a subordinate accelerator using AMD’s Deep-Learning Processing Units (DPU) instance on the Zynq device. Our tests show that the model can process each spectral band in an HSI image in 4 ms, 2.6× better than INT8 inference on Nvidia’s Jetson platform & 1.27× better than SOTA artefact detectors. Our model also achieves an f1-score of 92.8% and FPR of 0% across the dataset, while consuming 21.52 mJ per HSI image, 3.6× better than INT8 Jetson inference & 7.5× better than SOTA artefact detectors, making it a viable architecture for deployment in CubeSats.

14:45
TAP: Task-Aware Profiling on Integrated Systems

ABSTRACT. Hardware Performance Monitors (HPM) are increasingly exploited for timing verification and validation of time-critical embedded systems (TECS). HPMs are typically collected at the lowest software level, which makes it difficult to unequivocally account events to specific run-time entities, a prerequisite for any form of analysis, without relying on ad-hoc support from the run-time or operating system layer. The latter, however, is either unavailable or not fully adequate for verification requirements. Moreover, timing-related concerns in the analysis of embedded systems are typically addressed in the final stages of the software development process where multiple tasks are fully or partially integrated on the platform and it is therefore hard, if not impossible, to enforce controlled testing scenarios where contributions to event counts can be dissected. In this work, we present TAP a generic concept for allowing Task-Aware Profiling of individual tasks in an already integrated system on MPSoCs with on-core and off-core HPM support. The proposed approach combines a lightweight user-level configurable API and minimally intrusive extensions to the operating system layer to enforce separation of contexts when collecting HPM. We implement and assess TAP on top of an Infineon AURIX MPSoC and the OSEK-compliant ERIKA Enterpise RTOS, offering a consistent and intuitive interface for governing and filtering the different sources of events. Our results on synthetic and automotive benchmarks show that TAP can transparently gather and filter the events of interest while incurring negligible overheads.

15:15
Precision and Power Efficient Piece-wise-Linear Implementation of Transcendental Functions

ABSTRACT. The proposed piece-wise-linear (PWL) method utilizes the Method of Least Squares to implement transcendental functions such as Sigmoid and Hyperbolic-Tangent with controllable Maximum Absolute Error (MAE) in hardware. In addition, the quantization process is emulated in software to determine the fractional bit size required for hardware realization without introducing additional truncation errors. This enables effective control of error characteristics by designers. Implementation is carried out in sign-magnitude fixed-point and IEEE 754 half-precision floating-point formats in hardware. An Application Specific Integrated Circuit (ASIC) synthesis is performed using Cadence’s gpdk 90 nm technology nodes for hardware characterization of realized functions using the novel method. The proposed PWL implemented Sigmoid function resulted in 6.77% improvement in MAE, with 41% less delay and 70.6% power savings over the best of the SOTA works reported so far. Similarly, the proposed approximated Hyperbolic-Tangent function illustrated 10.3% better MAE, with 10.3% less delay and 45.3% power savings. Overall, the proposed PWL realized approximate function is established as a power-efficient and precision-effective design. These designs are a step towards building power efficient hardware accelerators for AI workload. All the hardware designs are made freely available for further usage to researchers and designers community.

14:00-15:30 Session 9B: Advanced Systems for Health Wellness and Personal Monitoring
Location: Room 106
14:00
Synchronisation of a Multimodal Sensing Setup for Analysis of Conservatory Pianists

ABSTRACT. In music performance research, conservatory pianists have been a subject of interest. However, previous studies have often relied on data captured from unsynchronised devices, leaving a significant portion of potential data unexplored. This paper introduces a synchronisation method for a multimodal sensor setup to address this gap. The primary focus of this paper is the implementation of a synchronization mechanism. The proposed technique is a cheap and robust way to synchronise the required multimodal setup. The insights derived from this research can improve future studies and methodologies in musical performance research and are more generically applicable towards any multimodal sensor setup.

14:22
In-Sensor Self-Calibration Circuit of MEMS Pressure Sensors for Accurate Localization

ABSTRACT. This paper presents an innovative real-time self-calibration unit designed to enhance the accuracy of pressure sensors following thermal stress. In this way, its employment has been enabled for the precise localization of people in case of emergencies or in situations where mobility is impaired. The proposed unit comprises a trigger module, which detects uncalibrations, and an error estimator module, which is activated by the trigger and estimates the error to be applied to pressure values through a compact reconfigurable neural network. The system offers reconfigurability, enabling adaptation to various scenarios, such as post-soldering and prolonged exposure to temperatures beyond the nominal range. Validation of the unit was conducted on LPS22HH pressure sensors at STMicroelectronics laboratories. Results demonstrate its capability to recover up to 1.6 hPa and effectively restore accuracy within a nominal range of 0.5 hPa. The system was implemented using STMicroelectronics BCD8 technology, featuring a core area of 0.55 mm2 and dynamic power consumption of 4.46 nW in the best scenario. These findings underscore the potential for integrating the system near the sensor, thus realizing an enhanced smart pressure sensor, particularly suited for demanding applications in Industry 4.0, where accurate sensors are indispensable.

14:44
FPGA Design of Digital Circuits for Phonocardiogram Pre-Processing Enabling Real-Time and Low-Power AI Processing

ABSTRACT. Cardiovascular Diseases (CVDs) stand as the leading cause of mortality worldwide. Detecting subtle heart sounds alterations in the early stages of CVDs can be crucial for an initial effective treatment. Accordingly, the analysis of Phonocardiograms (PCGs) through segmentation could be helpful for CVDs screening. The state-of-the-art algorithm for this task is based on a Convolutional Neural Network (CNN) with an encoding-decoding topology. Prior to the CNN processing, a computationally intensive input pre-processing, based on envelopes extraction, is needed. Thus, achieving real-time performance can be challenging. The main goal of this study is the hardware design, implementation, and evaluation of four PCG pre-processing circuits aiming at the design of a low-power point-of-care device for real-time Artificial Intelligence (AI)-based PCG segmentation. Results have shown that the approximations introduced by the fixed-point format and this innovative architecture have a negligible impact on the AI segmentation quality. Finally, the pre-processing chain is real-time compliant, achieving a maximum latency of 257 ms for processing a PCG patch within a 1.27 s window, while requiring only 6.1 mW of power.

15:06
Low-Power Implementation of a U-Net-based Model for Heart Sound Segmentation on a Low-Cost FPGA

ABSTRACT. This work presents the FPGA implementation of a U-Net-based model for heart sound segmentation, building upon prior hardware optimization efforts. By converting model parameters to Read-Only Memories (ROMs) instead of AXI ports, Block Random Access Memory (BRAM) consumption decreased significantly, from 99% to 58%. Additionally, latency was reduced from 29.27 to 17.66 ms as estimated during High-Level Synthesis (HLS) Cosimulation. The U-Net block was integrated into a block design, connected to the Zynq Processing System (PS), facilitating model evaluation on the PYNQ-Z2 board. Model accuracy reached 91.14%, close to HLS C simulation results and high-level Python description. FPGA measured latency was 17.77±0.01 ms, achieving real-time performance, with power consumption estimated at 134±14 mW. Energy per inference was calculated at 2.38±0.07 mJ. A power reduction study showed a 22% decrease in minimum power consumption compared to default settings, yet significant energy consumption decreases were not observed. This study offers insights for future optimizations, highlighting the applicability of FPGA-based heart sound segmentation in real-world scenarios, and setting the specifications for a potential hand-held device based on this design.

14:00-15:30 Session 9C: European Projects in Digital Systems Design – 3
Location: Room 116
14:00
The LoLiPoP-IoT Project: Long Life Power Platforms for Internet of Things

ABSTRACT. The LoLiPoP-IoT project aims to pioneer Long Life Power Platforms for IoT to extend battery life, minimize maintenance, and facilitate installation within existing environments. With a focus on supporting an inclusive ecosystem of developers, integrators, coordinators, and users, the project's Grand Objectives encompass a range of aims, including providing long-lasting battery solutions, reducing battery waste, enhancing asset tracking and predictive maintenance, and improving energy efficiency in buildings. These objectives are realized through nine selected practical applications across three primary domains: Asset Tracking, Condition Monitoring and Predictive Maintenance, and Energy Efficiency and Comfort in Buildings. Expected impacts of the LoLiPoP-IoT project include significantly extended battery life, reduced maintenance overhead, decreased costs associated with asset location, improved asset management efficiency, enhanced building comfort with reduced energy consumption, and substantial revenue generation for industry partners. The project's strategic objectives are notably harmonized with key EU initiatives outlined in the Green Deal, Circular Economy, and the New Industrial Strategy for Europe.

14:30
The METASAT Modelling and Code Generation Toolchain for XtratuM and Hardware Accelerators

ABSTRACT. Given the complex design requirements of future satellites and the need to comply with ECSS standards, the METASAT project introduces an innovative design approach that uses model-based engineering and is supported by open architecture hardware. The project highlights the importance of software virtualisation layers, such as hypervisors, in achieving compliance with standards on high-performance computing platforms. The project is dedicated to creating a specialised toolchain for these advanced hardware and software layers. We believe that without such innovations, the satellite industry could face increased and unsustainable development costs and timelines, potentially compromising its competitiveness and dependability. This paper provides a general overview of the METASAT project and introduces the approaches being used to add support for the XtratuM hypervisor and code generation for hardware accelerators in a model-based engineering workflow.

15:00
Trusted SMEs for Sustainable Growth of Europeans Economical Backbone to Strengthen the Digital Sovereignty: The KDT Resilient Trust Project
PRESENTER: Luigi Pomante

ABSTRACT. The Internet of Things is promising as it drives the datafication of our everyday life and thus, leverages synergies between originally considered “dead” things and enables them to proactively serve humans. IoT5.0, an Artificial Intelligence assisted Internet of Things, could even more benefit society, as the devices could even learn how to provide more value. But the ubiquitous connectivity comes at a cost. Security levels have to rise tremendously to ensure a network stays secure and safe for humans. This additional effort often is a burden for small and medium sized enterprises as the complexity and security demands of such systems rise faster than available resources. Consequently, RESILIENT TRUST focuses on end-to-end security of IoT processing chains with a focus on strong exploitation for SMEs. Moroever, RESILIENT TRUST will address and significantly mitigate the major risks to enable IoT5.0. That way, this project will be a driver for sustainable development and the generation of convenience and wealth. A solution is proposed to ensure end-to-end security by boosting RESILIENCE and TRUST along different key supply chains of IoT device.