View: session overviewtalk overview
Title: From Traditional to Digital: How software, data and AI are transforming the embedded systems industry
Abstract: With digitalization and with technologies such as software, data, and artificial intelligence, companies in the embedded systems domain are experiencing a rapid transformation of their conventional businesses. While the physical products and associated product sales provide the core revenue, these are increasingly being complemented with service offerings, new data-driven services, and digital products that allow for continuous value creation and delivery to customers. This talk explores the difference between what constitutes a traditional and a digital company and details the typical evolution path embedded systems companies take when transitioning towards becoming digital companies. The talk focuses on the changes associated with business models, ways-of-working and ecosystem engagements and provides concrete examples based on action-oriented research conducted in close collaboration with companies in the embedded systems domain.
Expomeloneras's Hall
Evaluating Cryptographic Extensions On A RISC-V Simulation Environment ABSTRACT. Due to the security requirement in the widely-deployed embedded applications, lightweight cryptographic ciphers have been offered and used in resource-constrained devices in the last decades. In addition to the intrinsic low-cost properties of these ciphers, implementation- and architecture-specific techniques can make the implementation of these ciphers even more efficient. In this paper, we propose a simulation environment for the open-source RISC-V Instruction Set Architecture (ISA) implementing the base RISC-V ISA as well as the "bit manipulation" instruction set extension (ISE), which facilitates the implementation of (lightweight) symmetric cryptography algorithms on resource-constrained devices efficiently. For demonstration purposes, we implement the lightweight block ciphers LEA, SIMON, and SPECK on our simulator and evaluate the performance of these ciphers on RISC-V architecture implemented with and without bit manipulation instructions. In our work, we define the performance of the lightweight ciphers as the total number of clock cycles required to encrypt one block of plaintext successfully. The performance of lightweight ciphers gives us an insight on how the performance of a cipher can be improved by using specific bit manipulation instructions. Our results show an average 38% improvement in the total number of clock cycles required to run lightweight ciphers while using bit manipulation instructions. |
Towards a Real-Time Smart Prognostics and Health Management (PHM) of Safety Critical Embedded Systems ABSTRACT. Machine learning techniques for the prognostics and health management of safety critical embedded systems is an area that has raised an increased interest recently. This paper investigates an implementable machine learning pipeline to address PHM requirements. Different types of machine learning techniques are evaluated on a real case study considering accuracy and real-time effectiveness. The referred system exhibited abnormal behaviour with multiple faults during an observed period, which led it to a low system availability and required maintenance, disturbing the normal operation. This paper presents a system approach to the fault finding, aiming the generalization to different electronic systems. We started with a thorough review of the available features, followed by a consistent hyperparameters optimization across all different techniques, ensuring a comparable baseline. Finally, the results were compared according to a defined evaluation and performance criteria, using a) metrics such as mean square error and R-squared to determine how well the model fitted the dataset, and b) execution times for their fit and predict methods for different training and test sizes, since this model is expected to run on real-time systems. All models presented a very good result to predict the output state of the system, but the stacked models outperformed the remaining ones. The proposed framework has been tested and validated on a real application case. |
On the Characterization of Jitter in Ring Oscillators using Allan variance for True Random Number Generator Applications ABSTRACT. The description of the physical noise source is of utmost importance for any TRNG certification. In the case of ring oscillators, Allan variance is a reliable and comprehensive tool which enables the distinction between the jitter coming from flicker noise (autocorrelated) and from thermal noise (random). In this paper, we realize measurements directly on the analog source of numerous TRNG structures: a ring oscillator. Our data, along with evidence from the literature, indicates the presence of a third noise source. The quantization noise has not been so far taken into consideration, but is unquestionably present in all TRNG reliant on jitter. Its importance is key to an accurate estimation of thermal noise contribution, which can be shadowed by it. Measurements presented in this paper show that insufficient sampling (i.e. higher quantization noise) can lead to an overestimation of jitter coming from thermal noise and therefore an overestimation of the calculated entropy. The latter is the only type of noise considered reliable in currently existing ring oscillator-based TRNG models. As a rule of thumb, three orders of magnitude between the sampling and the sampled signal are necessary in order to determine the thermal noise jitter correctly. |
Energy-Efficient Radix-4 Belief Propagation Polar Code Decoding Using an Efficient Sign-Magnitude Adder and Clock Gating ABSTRACT. Polar encoding is the first information coding method that has been proven to achieve channel capacity for binary-input discrete memoryless channels. Since its introduction, much research has been done on improving decoding performance, lowering execution time and power usage. Classic belief propagation uses radix-2 decoding, but a recent study proposed radix-4 decoding which reduces memory usage by 50%. However a drawback is its higher computational complexity, negatively impacting power usage and throughput. In this paper we present a low-power radix-4 belief propagation polar decoder architecture. We propose a new, efficient sign-magnitude adder that does not require conversion to two's complement and back. We also propose using clock gating of input values by checking if all R inputs of the decoder are zero. These two key contributions lead to a lower power usage and higher maximum clock speed and throughput. Post-layout simulation results show that compared to the previously proposed P(1024,512) radix-4 belief propagation polar code decoder, our decoder uses between 30.22% and 32.80% less power and is 5.2% smaller at the same clock speed. Also, our design can achieve a 15.7% higher clock speed at which it is still up to 10.76% more power efficient and 4.8% smaller. |
A holistic hardware-software approach for fault-aware embedded systems ABSTRACT. Fault detection and fault tolerance are a already crucial part of many embedded systems and will become even more important in the future. Reasons are the increasing complexity of software used in safety-critical environments and the trend to execute software components with varying criticality on the same hardware. We propose a novel approach for a flexible and adaptive fault handling. Our approach combines an adaptive hardware architecture with a flexible runtime environment to detect and handle faults. In this paper, we present the structure of a tile-based many-core architecture with runtime-adaptive lockstep cores and the design of a flexible dataflow software framework utilizing this hardware platform. We show that the hardware overhead for our adaptive lockstep concept and the hardware requirements of our runtime environment are small and thus allow use in embedded systems. Furthermore, we verified the fault detection and correction capabilities of both the hardware and software via a hardware fault injection mechanism. Lastly, our runtime evaluation shows promising results for different redundancy concepts. For this purpose, we compare the execution time of software-only and hardware-only redundancy solutions as well as combinations of both with a non-redundant baseline for different benchmark applications. |
PRESENTER: Gabriele Montanaro ABSTRACT. The recent advances in quantum computers impose the adoption of post-quantum cryptosystems into secure communication protocols. This work proposes two FPGA-based, client- and server-side hardware architectures to support the integration of the BIKE post-quantum KEM within TLS. Thanks to the parametric hardware design, the paper explores the best option between hardware and software implementations, given a set of available hardware resources and a realistic use-case scenario. The experimental evaluation comparing our client and server designs against the reference AVX2 and hardware implementations of BIKE highlighted two aspects. First, the proposed client and server architectures outperform the reference hardware implementation of BIKE by eight and four times, respectively. Second, the performance comparison between our client and server designs against the reference AVX2 implementation strongly depends on the available resource. Our solution is almost twice as fast as the AVX2 implementation while implemented on the Artix-7 200 FPGA, while it is up to six times slower when targeting smaller FPGAs, thus motivating a careful analysis of the available hardware resources and the optimization of the design’s parallelism before opting for hardware support. |
Adaptive Exploration Based Routing for Spatial Isolation in Mixed Criticality Systems PRESENTER: Nidhi Anantharajaiah ABSTRACT. Applications of different criticality are increasinglysharing the same System-on-Chip platform to be cost andresource effective. On such mixed criticality systems, spatialpartitioning of resources is a commonly utilized technique toprevent interference between applications. At the communicationlevel, Network-on-Chip (NoC) used in such systems can aid byisolating network traffic within application regions. Topologiesthat can develop in such partitions can be regular or irregularrequiring minimal and non-minimal routing. For the NoC tobe flexible and support such varying network parameters, it isdesirable that the routing algorithm can support communicationfor all possible topologies. Here, we investigate a topology agnosticrouting algorithm based on Ant Colony Optimization (ACO)metaheuristic. The routing algorithm explores the NoC forfeasible paths using special ant packets and discovers paths basedon history of already utilized paths and local traffic information.We aim to decrease the exploration time overhead, by proposingan adaptive exploration technique. Compared to the static version,the proposed technique can decrease the exploration timeoverhead by upto 68% while maintaining comparable latency and throughput. |
10:30 | Sargantana: A 1 GHz+ In-Order RISC-V Processor with SIMD Vector Extensions in 22nm FD-SOI PRESENTER: Víctor Soria ABSTRACT. The RISC-V open Instruction Set Architecture (ISA) has proven to be a solid alternative to licensed ISAs. In the past 5 years, a plethora of industrial and academic cores and accelerators have been developed implementing this open ISA. In this paper, we present Sargantana, a 64-bit processor based on RISC-V that implements the RV64G ISA, a subset of the vector instructions extension (RVV 0.7.1), and custom application-specific instructions. Sargantana features a highly optimized 7-stage pipeline implementing out-of-order write-back, register renaming, and a non-blocking memory pipeline. Moreover, Sargantana features a Single Instruction Multiple Data (SIMD) unit that accelerates domain-specific applications. Sargantana achieves a 1.26 GHz frequency in the typical corner, and up to 1.69 GHz in the fast corner using 22nm FD-SOI commercial technology. As a result, Sargantana delivers a 1.77× higher Instructions Per Cycle (IPC) than our previous 5-stage in-order DVINO core, reaching 2.44 CoreMark/MHz. Our core design delivers comparable or even higher performance than other state-of-the-art academic cores performance under Autobench EEMBC benchmark. This way, Sargantana lays the foundations for future RISC-V based core designs able to meet industrial-class performance requirements for scientific, real-time, and high-performance computing applications. |
10:55 | Suitability of ISAs for Data Paths Based on Redundant Number Systems: Is RISC-V the Best? ABSTRACT. It has been known for a long time that in processor design, delay in arithmetic circuits can be reduced by using redundant number representations (RNS). This advantage is currently only exploited to a limited extent since various aspects complicate its use. For this reason the redundant representation is abandoned at the boundaries of the ALU and the values are reconverted back to the traditional binary representation. In particular, some operations that are traditionally considered fast are now subject to a higher delay. Among other concerns, this complicates comparison operations (e.g. equal to, greater than) and thus affects the timing behavior of conditional jumps. There is some initial research promising speedups using RNS in register files and the data path, but there are still some open questions. In particular it is important to evaluate how the instruction set is designed. Such a study is necessary to estimate whether it is worthwhile to develop the entire data path beyond the ALU in redundant representation. If no reconversion from redundant to traditional binary number system takes place, then e.g. the evaluation of condition codes or flags is problematic, since all speed advantages are lost again. In this work a qualitative and quantitative analysis of common ISAs in processors with redundant data paths is presented. All relevant properties of an ISA are identified and an evaluation of several common ISAs according to these criteria. A performance comparison of three common RISC ISAs (MIPS, A64 (ARM), RISC-V) is given based on a simulation of the Embench benchmark suite using an adapted version of QEMU. This comparison estimates the speedup of processors with redundant versus binary data paths. It was found, that RISC-V was overall outperforming the other ISA with a maximum speedup of 1.41. |
11:20 | A Resilient System Design to Boot a RISC-V MPSoC ABSTRACT. This paper presents a highly resilient multi-mode boot process design for a new RISC-V based multiprocessor system-on-chip. The original PULP-platform boot process was significantly modified due to changes in the processor core, the system architecture and the requirement for guaranteed chip wake-up. We outline the characteristic challenges of implementing a large program into a bootROM and propose generally applicable workflows to verify the boot process for application specific integrated circuit synthesis. We implemented four distinct boot modes. Two modes that load a software bootloader autonomously from an SD card are implemented for SDIO and SPI, respectively. Another SDIO based mode allows for direct program execution from external memory, while the last mode is based on a debug module accessed through JTAG. The boot process was verified with instruction set simulation, register transfer level simulation, gate-level simulation, and FPGA prototyping. The confidence in successful booting was achieved and the chip design was handed off to fabrication. |
11:37 | RISC-V Core with Approximate Multiplier for Error-Tolerant Applications PRESENTER: Anu Verma ABSTRACT. RISC-V is an open-source instruction set architecture with customizable extensions to introduce operations like multiplication, division, atomic functions, and floating-point operations. In this paper, a new approximate multiplier is integrated with RI5CY (CV32E40P) processor for integer and floating-point multiplication for error-tolerant applications. The multiplication operation is required in various engineering and scientific applications, including image processing, digital signal processing, and many others. The proposed approximate multiplier is based on linear CORDIC (COordinate Rotation Digital Computer) algorithm and implemented by using only shift-add operations. It can perform multiplication operations and MAC (Multiply-accumulate operation) operations. The FPGA (Field programmable gate arrays) implementation results and ASIC (Application-specific integrated circuit) synthesis results for the proposed approximate multiplier along with RI5CY core are reported. The proposed design with RI5CY core is implemented on FPGA (Field programmable gate arrays) Xilinx Zedboard, which improves the performance by 20 % and reduces power delay product (PDP) by 15.79 % over the existing multipliers of the RI5CY core. Moreover, RI5CY core with proposed approximate multiplier is synthesized using Industrial 130 nm standard cell library (ISCL) and Sub-threshold 130 nm standard cell library (STSCL) in Synopsys DC compiler. In case of STSCL, RI5CY core with proposed approximate multiplier has 11.76 % less power-consumption, 27.27 % less delay, and 38.77 % PDP compared to the existing multipliers of the RI5CY core. |
10:30 | TextBack: Watermarking Text Classifiers using Backdooring ABSTRACT. Creating high performance neural networks is expensive, incurring costs that can be attributed to data collection and curation, neural architecture search and training on dedicated hardware accelerators. Stakeholders invested in any one or more of these aspects of deep neural network training expect assurances on ownership and guarantees that unauthorised usage is detectable and therefore preventable. Watermarking the trained neural architectures can prove to be a solution to this. While such techniques have been demonstrated in image classification tasks, we posit that a watermarking scheme can be developed for natural language processing applications as well. In this paper, we propose TextBack, which is a watermarking technique developed for text classifiers using backdooring. We have tested for the functionality preserving properties and verifiable proof of ownership of TextBack on multiple neural architectures and datasets for text classification tasks. The watermarked models consistently generate accuracies within a range of 1-2% of models without any watermarking, whilst being reliably verifiable during watermarking verification. TextBack has been tested on two different kinds of Trigger Sets, which can be chosen by the owner as preferred. We have studied the efficiencies of the algorithm that embeds the watermarks by fine tuning using a combination of Trigger samples and clean samples. The benefit of using TextBack's fine tuning approach on pre-trained models from a computational cost perspective against embedding watermarks by training models from scratch is also established experimentally. |
10:47 | DNAsim: Evaluation Framework for Digital Neuromorphic Architectures ABSTRACT. Neuromorphic architectures implement low-power machine learning applications using spike-based biological neu- ron models and bio-inspired algorithms. Prior work on mapping Spiking Neural Networks (SNNs) focused mainly on minimizing inter-core spike communication and on specific computing ar- chitectures like crossbar memories. SNN mapping choices on a neuromoprhic platform can have varying effects on performance. In this paper we introduce a simulation framework that enables the generation and evaluation of SNN mappings on a user-defined neuromorphic hardware model. Our simulator can evaluate performance of applications based on their spike activities, the hardware model and the application’s mapping on the hardware by taking into account inter-core communication as well as the computation load per tile. We create two hardware models based on reported work in literature and show the evaluation of different mapping scenarios on a state-of-the-art SNN. |
11:04 | Quantization: how far should we go? ABSTRACT. Machine learning, and specifically Deep Neural Networks (DNNs) are impacted all parts of daily life. Although DNNs can be large and compute intensive, requiring processing on big servers (like in the cloud), we see a move of DNNs into IoT-edge based systems, adding intelligence to these systems. These systems are often energy constrained and too small for satisfying the huge DNN computation and memory demands. DNN model quantization may come to the rescue. Instead of using 32-bit floating point numbers, much smaller formats can be used, down to binary numbers. Although this largely may solve the compute and memory problems, it comes with a huge price, model accuracy reduction. This problem spawned a lot of research into model repair methods, especially for binary neural networks. Heavy quantization triggers a lot of debate; we even see some movements of going back to higher precision using brainfloats. This paper therefore evaluates the trade-off between energy reduction through extreme quantization versus accuracy loss. This evaluation is based on ResNet-18 with the ImageNet dataset, mapped to a fully programmable architecture with special support for 8-bit and 1-bit deep learning, the BrainTTA. We show that, after applying repair methods, the use of extremely quantized DNNs makes sense. They have superior energy efficiency compared to DNNs based on 8-bit precision of weights and data, while only having a slightly lower accuracy. There is still an accuracy gap, requiring further research, but results are promising. A side effect of the much lower energy requirements of BNNs is that external DRAM becomes more dominant. This certainly requires further attention. |
11:21 | CaW-NAS: Compression Aware Neural Architecture Search PRESENTER: Hadjer Benmeziane ABSTRACT. With the ever-growing demand for deep learning (DL) at the edge, building small and efficient DL architectures has become a significant challenge. Optimization techniques such as quantization, pruning or hardware-aware neural architecture search (HW-NAS) have been proposed. In this paper, we present an efficient HW-NAS; Compression-Aware Neural Architecture search (CaW-NAS), that combines the search for the architecture and its quantization policy. While former works search over a fully quantized search space, we define our search space with quantized and non-quantized architectures. Our search strategy finds the best trade-off between accuracy and latency according to the target hardware. Experimental results on a mobile platform show that, our method allows to obtain more efficient networks in terms of accuracy and execution time when compared to the state of the art. |
11:38 | Demystifying the TensorFlow Eager Execution of Deep Learning Inference on a CPU-GPU Tandem PRESENTER: Paul Delestrac ABSTRACT. Machine Learning (ML) frameworks are tools that facilitate the development and deployment of ML models. These tools are major catalysts of the recent explosion in ML models and hardware accelerators thanks to their high programming abstraction. However, such an abstraction also obfuscates the runtime execution of the model and complicates the understanding and identification of performance bottlenecks. In this paper, we demystify how a modern ML framework manages code execution from a high-level programming language. We focus our work on the TensorFlow Eager execution, which remains obscure to many users despite being the simplest mode of execution in TensorFlow. We describe in detail the process followed by the runtime to run code on a CPU-GPU tandem. We propose new metrics to analyze the framework’s runtime performance overhead. We use our metrics to conduct in-depth analysis of the inference process of two Convolutional Neural Networks (CNNs) (LeNet-5 and ResNet-50) and a transformer (BERT) for different batch sizes. Our results show the ML framework’s runtime overhead can be significant depending on the configuration. |
10:30 | Development of a Hyperspectral Colposcope for Early Detection and Assessment of Cervical Dysplasia PRESENTER: Carlos Vega García ABSTRACT. The early detection of precancerous cervical lesions is essential to improve patient treatment and prognosis. Current methods of screening and diagnosis have improved the detection of these lesions but still present some critical limitations. Hyperspectral (HS) imaging is emerging as a new non-invasive and label-free imaging technique in the medical field for performing quick diagnosis of different diseases. This work describes the first step in the research and development process to present to the gynaecologist a new non-invasive tool to detect cervical neoplasia during routine medical procedures. This tool is based on a HS camera coupled to a colposcope, a primary tool already used in cervical examinations. The developed HS colposcope was validated by comparing the HS images obtained against the captures obtained with conventional optics. Results show the feasibility of the developed system to start a data acquisition campaign of cervical lesions targeting future developments of algorithms based on artificial intelligence. |
10:55 | Glioblastoma Classification in Hyperspectral Images by Nonlinear Unmixing ABSTRACT. Glioblastoma is considered an aggressive tumor due to its rapid growing rate and diffuse pattern in various parts of the brain. Current in-vivo classification procedures are executed under the supervision of an expert. However, this methodology could be subjective and time-consuming. In this work, we propose a classification method for in-vivo hyper-spectral brain images to identify areas affected by glioblastomas based on a nonlinear spectral unmixing. This methodology follows a semi-supervised approach for the estimation of the end-members in a multi-linear model. To improve the classification results, we vary the number of end-members per-class to address spectral variability of each studied type of tissue. Once the set of end-members is obtained, the classification map is generated according to the end-member with the highest contribution in each pixel, followed by morphological operations to smooth the resulting maps. The classification results demonstrate that the proposed methodology generates good performance in regions of interest identification, with an accuracy performance above 0.75 and 0.96 in the inter- and intra- patient strategies, respectively. These results indicate that the proposed methodology has the potential to be used as an assistant tool in glioblastoma diagnosis in hyperspectral imaging. |
11:20 | Evaluation of artificial neural networks for the detection of esophagus tumor cells in microscopic hyperspectral images ABSTRACT. Microscopic analysis of histological slides of cancer tissue samples is standardly performed under white light microscopy. Artificial intelligence (AI) methods showed promising results for the automatic identification of tumor cells. Hyperspectral imaging (HSI) combined with AI approaches can improve the accuracy, reliability, and time of the analysis. In this work, a HSI camera was coupled with a standard microscope to acquire microscopic hyperspectral (HS) images of stained histological slides of esophagus cancer tissue of 95 patients. The HS images were analyzed with deep learning algorithms to discriminate healthy cells (squamous epithelium) and tumors (stroma tumor and esophagus adenocarcinoma EAC). Five models were considered: a 2D CNN, a 2D CNN preserving the spatial relationship between spectral layers, a 3D CNN, a pre-trained 3D CNN and a recurrent neural network (RNN). They were evaluated using a leave-one-patient-out cross-validation. The predicted two classes were visualized with false colors. The RNN obtained the highest quantitative results with an accuracy of 0.791, an AUC of 0.79 and a computing time of 7.57 s per 10,000 patches. The best visual result was obtained on two selected HS images with the 2D CNN model. The performance of the automatic classification was higher on tissue, which hasn’t been treated with previous neoadjuvant therapy. The combination of HSI with deep learning method is promising for the automatic analysis of histological slides for cancer diagnosis. |
11:37 | Hyperparameter Optimization for Brain Tumor Classification with Hyperspectral Images ABSTRACT. Hyperspectral (HS) imaging (HSI) techniques have demonstrated to be useful in the medical field to characterize tissues without any contact and without ionizing the patient. Besides, HSI combined with supervised machine learning (ML) algorithms have proven to be an effective technique to assist neurosurgeons to resect brain tumors. This research looks at the effects of hyperparameter optimization on two common supervised ML algorithms used for brain tumor classification: support vector machines (SVM) and random forest (RF). Correctly classifying brain tumor with HS data containing low spatial and spectral information can be challenging. To tackle this problem, this study has applied hyperparameter optimization techniques on SVM and RF with 10 brain images of patients suffering from glioblastoma multiforme (GBM) with not mutated Isocitrate dehydrogenase (IDH) enzymes. These captures have 409x217 spatial resolution and 25 normalized reflectance wavelengths gathered from 665 to 960 nm with a HS snapshot camera. Results show how this work has been able to obtain 98,60% of weighted area under the curve (AUC) on the test score by employing naive optimizations like grid search (GS) or random search (RS) and even more complex methods based on Bayesian optimization (BO). Not only the weighted AUC of SVM has been improved by 8%, but BO have also enhanced the AUC of the tumor class by 22.50% in comparison with non-optimized SVM models in the state-of-the-art, achieving AUC values of 95,49% on the tumor class. Furthermore, these improvements have been illustrated with classification maps to demonstrate the importance of hyperparameter optimization on SVM to clearly classify brain tumor, whereas non-optimized models from previous studies are unable to detect the tumor. |
10:30 | RED-SEA: Network Solution for Exascale Architectures ABSTRACT. In order to enable Exascale computing, next gener- ation interconnection networks must scale to hundreds of thou- sands of nodes, and must provide features to also allow the HPC, HPDA, and AI applications to reach Exascale, while benefiting from new hardware and software trends. RED-SEA will pave the way to the next generation of European Exascale interconnects, including the next generation of BXI, as follows: (i) specify the new architecture using hardware-software co-design and a set of applications representative of the new terrain of converging HPC, HPDA, and AI; (ii) test, evaluate, and/or implement the new architectural features at multiple levels, according to the nature of each of them, ranging from mathematical analysis and modelling, to simulation, or to emulation or implementation on FPGA testbeds; (iii) enable seamless communication within and between resource clusters, and therefore development of a high-performance low latency gateway, bridging seamlessly with Ethernet; (iv) add efficient network resource management, thus improving congestion resiliency, virtualization, adaptive routing, collective operations; (v) open the interconnect to new kinds of applications and hardware, with enhancements for end-to- end network services – from programming models to reliability, security, low- latency, and new processors; (vi) leverage open standards and compatible APIs to develop innovative reusable libraries and Fabrics management solutions. |
10:55 | Sense and Control of Oscillating MEMS Mirrors ABSTRACT. Manufacturing advances in the field of micro-electro-mechanical systems (MEMS) enabled the realization of MEMS mirrors. These MEMS mirrors, which are as small as a few square millimeter, are promising candidates for various types of automotive applications requiring laser beam steering. Prime application examples are Head-Up Displays (HUD) or Light Detection and Ranging (LiDAR) sensors. In this work, we present the latest advancements and concepts for sensing, actuating, and controlling oscillating comb-drive based MEMS mirrors. Also a novel approach towards high-voltage generation and actuation is presented which is capable to reduce the MEMS' power dissipation by factors. Furthermore, the performance of the presented control system is depicted by means of an automotive LiDAR use-case which was integrated into an automated demo vehicle. |
11:20 | Sentient Spaces: Intelligent Totem Use Case in the ECSEL FRACTAL Project PRESENTER: Luigi Pomante ABSTRACT. The objective of the FRACTAL project is to create a new approach to reliable edge computing. The FRACTAL computing node will be the building block of scalable Internet of Things (from Low Computing to High Computing Edge Nodes). The node will also have the capability of learning how to improve its performance against the uncertainty of the environment. In such a context, this paper presents in detail one of the key use cases: an Internet-of-Things solution represented by intelligent totems for advertisement and wayfinding services within advanced ICT-based shopping malls, that are conceived as a sentient space. The paper outlines the reference scenario and provide an overview of the architecture, and the functionality, of the demonstrator, as well as a roadmap for its development and evaluation. |
11:37 | Abeto framework: a Solution for Heterogeneous IP Management ABSTRACT. The use of third-party IP cores tends to present difficulties because of a lack of standardization in their packaging, distribution and management. Often, IP users find themselves writing code to enable the integration or testing of the IP core, which is not available as part of their distribution. In this work Abeto is presented, a new software tool for IP core databases management. It has been conceived to integrate and use a heterogeneous group of IP cores, described in HDL, with an unified set of instructions. In order to do so, Abeto requires from every IP core some side information about its packaging and how to operate with the IP. Currently, Abeto provides support for a set of common EDA tools and has been successfully applied to the European Space Agency portfolio of IP cores for benchmarking purposes. |
Title: Open-Source Research on Time-predictable Computer Architecture
Abstract: Real-time systems need time-predictable computers to be able to guarantee that computation can be performed within a given deadline.For worst-case execution time analysis we need detailed knowledgeof the processor and memory architecture. Providing the design of a processor in open-source enables the development of worst-cease execution time analysis tools without the unsafe reverse engineering of processor architectures. Open-source software is currently the basis of many Internet services, e.g., an Apache web server running on top of Linux with a web application written in Java. Furthermore, for most programming languages in use today, there are a open-source compilers available. However, hardware designs are seldom published in open-source. Furthermore, many artifacts developed in research, especially hardware designs, are not published in open-source. The two main arguments formulated against publishing research in open source are:(1) “When I publish my source before the paper gets accepted, someone may steal my ideas” and(2) “My code is not pretty enough to publish it, I first need to clean it up (which seldom happens)”. In this paper and in the presentation I will give counterarguments for those two issues. I will present the successful T-CREST/Patmos research project, where almost all artifacts have been developed in open-source from day one. Furthermore, I will present experiences using the Google/Skywater open-sourcetool flow to produce a Patmos chip with 12 students within a one semester course.
Buffet lunch at Lopesan Baobab Resort.
14:30 | Task Mapping and Scheduling in FPGA-based Heterogeneous Real-time Systems: A RISC-V Case-Study ABSTRACT. Heterogeneous platforms, that integrate CPU and FPGA-based processing units, are emerging as a promising solution for accelerating various applications in the embedded system domain. However, in this context, comprehensive studies that combine the theoretical aspects of real-time scheduling of tasks along with practical runtime architectural characteristics have mostly been neglected so far. To fill this gap, in this paper we propose a real-time scheduling algorithm with the objective of minimizing the overall execution time under hardware resource constraints for heterogeneous CPU+FPGA architectures. In particular, we propose an Integer Linear Programming (ILP) based technique for task allocation and scheduling. We then show how to implement a given scheduling on a practical CPU+FPGA system regarding current technology restrictions and validate our methodology using a practical RISC-V case-study. Our experiments demonstrate that performance gains of 40% and area usage reductions of 67% are possible compared to a full software and hardware execution, respectively. |
14:55 | Decomposition of transition systems into sets of synchronizing Free-choice Petri Nets ABSTRACT. Petri nets and transition systems are two important formalisms used for modeling concurrent systems. One interesting problem in this domain is the creation of a Petri net with a reachability graph equivalent to a given transition system. This paper focuses on the creation of a set of synchronizing Free-choice Petri nets (FCPNs) from a transition system. FCPNs are more amenable for visualization and structural analysis while not being excessively simple, as in the case of state machines. The results show that with a small set of FCPNs, the complexity of the model can be reduced when compared to the synthesis of a monolithic Petri net. |
15:12 | Placement of Chains of Real-Time Tasks on Heterogeneous Platforms under EDF Scheduling PRESENTER: Daniel Casini ABSTRACT. When designing a real-time system, application architects are called to settle many non-trivial decisions that may severely influence the system's performance. With modern hardware platforms always being more and more complex and equipped with heterogeneous processor cores or even hardware accelerators such as TPUs, FPGAs, or GPUs, the complexities to be faced by application architects are exacerbated. Therefore, they are called to wisely allocate the computational resources provided by the hardware platform to application tasks in such a way to meet timing requirements and optimize other goals such as energy consumption. This paper proposes a mixed-integer linear programming formulation (MILP) to solve the task-to-heterogeneous-cores allocation problem while guaranteeing the schedulability of a real-time application running on the platform under partitioned Earliest Deadline First (EDF) scheduling. A new method to derive approximate worst-case response-time bounds is also presented and leveraged to setup the MILP formulation, which allows computing and minimizing the end-to-end latency of processing chains and considers energy requirements. The approach is evaluated on a task set based on the WATERS 2019 Industrial Challenge proposed by Bosch. |
15:29 | Prebypass: Software Register File Bypassing for Reduced Interconnection Architectures PRESENTER: Kanishkan Vadivel ABSTRACT. Exposed Datapath Architectures (EDPAs) with aggressively pruned data-path connectivity, where not all function units in the design have connections to a centralized register file, are promising solutions for energy-efficient computation. A direct bypassing of data between function units without temporary copies to the register file is a prime optimization for programming such architectures. However, traditional compiler frameworks, such as LLVM, assume function-units connect to register-files and allocate all live variables in register-files. This leads to schedule inefficiencies in terms of ILP and register accesses in the EDPAs. To address these inefficiencies, we propose Prebypass; a new optimization pass for EDPA compiler backends. Experimental results on an EDPA class of architecture, Transport-Triggered Architecture, show that Prebypass improves the runtime, register reads, and register writes up to 16%, 26%, and 37% respectively, when the datapath is extremely pruned. Evaluation in a 28-nm FDSOI technology reveals that Prebypass improves the core-level Energy by 17.5% over the current heuristic scheduler. |
15:46 | X-on-X: Distributed Parallel Virtual Platforms for Heterogeneous Systems ABSTRACT. The complexity of modern heterogeneous systems leads to simulation performance problems. We show how heterogeneous system verification can be accelerated using a hetrogeneous simulator architecture, by distributing simulations amongst different hosts with a novel SystemC TLM-compliant method. Hosts are combined via a high-speed network to leverage their specific advantages when executing simulation segments. To avoid timing causality problems, a conservative, asynchronous parallel discrete event simulation approach is used. We analyze a machine learning task on an embedded Linux system using an ARMv8 virtual platform containing a commercial deep learning accelerator. There, our approach enables speedups of up to 3.9x. |
14:30 | ARTS: An adaptive regularization training schedule for activation sparsity exploration ABSTRACT. Brain-inspired event-based processors have attracted considerable attention for edge deployment because of their ability to efficiently process Convolutional Neural Networks (CNNs) by exploiting sparsity. On such processors, one critical feature is that the speed and energy consumption of CNN inference are approximately proportional to the number of non-zero values in the activation maps. Thus, to achieve top performance, an efficient training algorithm is required to largely suppress the activations in CNNs. We propose a novel training method, called Adaptive-Regularization Training Schedule (ARTS), which dramatically decreases the non-zero activations in a model by adaptively altering the regularization coefficient through training. We evaluate our method across an extensive range of computer vision applications, including image classification, object recognition, depth estimation, and semantic segmentation. The results show that our technique can achieve 1.41 times to 6.00 times more activation suppression on top of ReLU activation across various networks and applications, and outperforms the state-of-the-art methods in terms of training time, activation suppression gains, and accuracy. A case study for a commercially-available event-based processor, Neuronflow, shows that the activation suppression achieved by ARTS effectively reduces CNN inference latency by up to 8.4 times and energy consumption by up to 14.1 times. |
14:55 | Co-Optimization of DNN and Hardware Configurations on Edge GPUs PRESENTER: Halima Bouzidi ABSTRACT. The ever-increasing complexity of both Deep Neural Networks (DNN) and hardware accelerators has made the co-optimization of these two domains extremely complex. Previous works typically focus on optimizing DNNs given a fixed hardware configuration or optimizing a specific hardware architecture given a fixed DNN model. Recently, the importance of the joint exploration of the two spaces drew more and more attention. Our work targets the co-optimization of DNN and hardware configurations on edge GPU accelerator. We propose an evolutionary-based co-optimization strategy for DNN by considering three metrics: DNN accuracy, execution latency and power consumption. By combining the two search spaces, a high number of interesting solutions can explored in a short time interval. In addition a better tradeoff between DNN accuracy and hardware efficiency can be obtained. Experimental results show that the co-optimization outperforms the traditional optimization of DNN on a fixed hardware configuration with up to 53\% power consumption reduction without impacting accuracy and inference time. |
15:20 | Hardware Acceleration of Deep Neural Networks for Autonomous Driving on FPGA-based SoC PRESENTER: Alessandro Biondi ABSTRACT. In the last decade, enormous and renewed attention to Artificial Intelligence has emerged thanks to Deep Neural Networks (DNNs), which can achieve high performance in performing specific tasks at the cost of a high computational complexity. GPUs are commonly used to accelerate DNNs, but generally determine a very high power consumption and poor time predictability. For this reason, GPUs are becoming less attractive for resource-constrained, real-time systems, while there is a growing demand for specialized hardware accelerators that can better fit the requirements of embedded systems. Following this trend, this paper focuses on hardware acceleration for the DNNs used by Baidu Apollo, an open-source autonomous driving framework. As an experience report of performing R&D with industrial technologies, we discuss challenges faced in shifting from GPU-based to FPGA-based DNN acceleration when performed using the DPU core by Xilinx deployed on an Ultrascale+ SoC FPGA platform. Furthermore, it shows pros and cons of today's hardware accelerating tools. Experimental evaluations were conducted to evaluate the performance of FPGA-accelerated DNNs in terms of accuracy, throughput, and power consumption, in comparison with those achieved on embedded GPUs. |
15:37 | RRAM-based Neuromorphic Computing: Data Representation, Architecture, Logic, and Programming ABSTRACT. RRAM crossbars provide a promising hardware platform to accelerate matrix-vector multiplication in deep neural networks (DNNs). To exploit the efficiency of RRAM crossbars, extensive research examining architecture, data representation, logic design as well as device programming should be conducted. This extensive scope of research aspects is enabled and required by the versatility of RRAM cells and their organization in a computing system. These research aspects affect or benefit each other. Therefore, they should be considered systematically to achieve an efficient design in terms of design complexity and computational performance in accelerating DNNs. In this paper, we illustrate study examples on these perspectives on RRAM crossbars, including data representation with voltage levels, architecture improvement, implementation of logic functions using RRAM cells, and efficient programming of RRAM devices for accelerating DNNs. |
14:30 | Network on Privacy-Aware Audio-and Video-Based Applications for Active and Assisted Living: GoodBrother Project ABSTRACT. Active and Assisted Living (AAL) systems have a purpose to improve the lives of older or impaired people in various aspects. However, the use of equipment for data acquisition in these systems can be considered intrusive in some cases. Although de-identification may provide the needed protection to some extent, it is not always preferred, as it could affect the quality and utility of any obtained data. It is therefore crucial to establish methodologies for protecting the privacy of those monitored and thus affected by AAL systems. The purpose of GoodBrother is to a) analyze any issues arising from the use of monitoring AAL systems, regarding the users’ privacy; b) establish proper guidelines for the use of these systems; c) develop privacy-aware methodologies for data handling; d) increase the systems’ robustness and reliability; e) create databases to use towards benchmarking. Each one of these objectives are handled by separate interdisciplinary working groups. |
14:55 | SmartDelta: Automated Quality Assurance and Optimization in Incremental Industrial Software Systems Development PRESENTER: Mehrdad Saadatmand ABSTRACT. A common phenomenon in software development is that as a system is being built and incremented with new features, certain quality aspects of the system begin to deteriorate. Therefore, it is important to be able to accurately analyze and determine the quality implications of each change and increment to a system. To address this topic, the multinational SmartDelta project develops automated solutions for quality assessment of product deltas in a continuous engineering environment. The project will provide smart analytics from development artifacts and system executions, offering insights into quality degradation or improvements across different product versions, and providing recommendations for next builds. |
15:20 | COMP4DRONES: Key Enabling Technologies for Drones to enhance Mobility and Logistics Operations ABSTRACT. This Paper presents the achievements of the COMP4DRONES project. It aims to raise awareness of the potential for future mobility and logistics applications by integrating drones in the Intelligent Transport Systems. This paper presents the outcomes of this European project that has developed key technologies to deploy innovative drone-based services. It is presented the results of two use-cases where different transport and mobility stakeholders have included drones in their operations to validate the COMP4DRONES framework as well as the key technologies to enable the use of drones in the mobility and the transport sector. |
Expomeloneras's Hall
Skeptical Dynamic Dependability Management for Automated Systems ABSTRACT. Dynamic Dependability Management (DDM) is a promising approach to guarantee and monitor the ability of safety-critical Automated Systems (ASs) to deliver the intended service with an acceptable risk level. However, the non-interpretability and lack of specifications of the Learning-Enabled Components (LECs) used in ASs make this mission particularly challenging. Some existing DDM techniques overcome these limitations by using probabilistic environmental perception knowledge associated with predicting behavior changes for the agents in the environment. We propose to improve these techniques with a supervisory system that considers hazard analysis and risk assessment from the design stage. This hazard analysis is based on a characterization of the AS's operational domain (i.e., its scenario space including unsafe ones). This proposed supervisory system also considers the uncertainty estimation and interaction between AS components through the whole perception-planning-control pipeline. Our framework then proposes leveraging and handling uncertainty from LEC components toward building safer ASs. |
Hardware architecture for high throughput event visual data filtering with matrix of IIR filters algorithm ABSTRACT. Neuromorphic vision is a rapidly growing field with numerous applications in the perception systems of autonomous vehicles. Unfortunately, due to the sensors working principle, there is a significant amount of noise in the event stream. In this paper we present a novel algorithm based on an IIR filter matrix for filtering this type of noise and a hardware architecture that allows its acceleration using an SoC FPGA. Our method has a very good filtering efficiency for uncorrelated noise -- over 99% of noisy events are removed. It has been tested for several event data sets with added random noise. We designed the hardware architecture in such a way as to reduce the utilisation of the FPGA's internal BRAM resources. This enabled a very low latency and a throughput of up to 385.8 MEPS million events per second. The proposed hardware architecture was verified in simulation and in hardware on the Xilinx Zynq Ultrascale+ MPSoC chip on the Mercury+ XU9 module with the Mercury+ ST1 base board. |
Monitoring Framework to Support Mixed-Criticality Applications on Multicore Platforms ABSTRACT. The automotive industry is looking into integrated architecture to combine multiple application subsystems of different criticalities on the readily available low-cost multicore platforms as they promise several benefits. However, it is difficult to achieve the required isolation and guarantees in such integrated architecture due to contention in the shared resources, such as CPU, shared-bus, memory (controller), and network. This can cause unpredictable delays leading to deadline misses in real-time applications. We propose a low overhead modular monitoring framework to provide support towards ensuring that the real-time applications meet their deadline when considering shared resources accesses, and helping to improve resource utilization so that best-effort applications can achieve a better Quality-of-Service despite pessimistic resource allocation assumptions of real-time applications. Our framework keeps the monitoring overheads to a minimum and triggered reaction meaningful by operating on the basis of low-level hardware and software signals, strategically checking resources, and triggering actions based on abstract availability of resources. We propose a Domain-Specific Language(DSL) to relieve the system designers from the tedious and error-prone job of configuring platform-specific parameters for the monitoring framework. Finally, this paper evaluates our monitoring framework based on an instantiation of the framework on a Xilinx Zynq UltraSacle multicore SoC running Linux and a simple industry-inspired use case. |
AxE: An Approximate–Exact Multi-Processor System-on-Chip Platform PRESENTER: Nima Taherinejad ABSTRACT. Due to the ever-increasing complexity of the computing tasks, emerging computing paradigms that increase efficiency, such as approximate computing, are gaining momentum. However, so far the majority of proposed solutions for hardware-based approximation have been application-specific and/or limited to a smaller unit of the computing system and require engineering effort for integration into the rest of the system. In this paper, we present Approximate and Exact Multi-Processor system-on-chip (AxE) platform. AxE is the first general-purpose approximate Multi-Processor System-on-Chip (MPSoC). AxE is a heterogeneous RISC-V platform with exact and approximate cores that allows exploring hardware approximation for any application and using software instructions. Using the full capacity of an entire MPSoC, especially a heterogeneous one such as AxE, is an increasingly challenging problem. Therefore, we also propose a task scheduling method for running exact and approximable applications on AxE. That is, a mixed application mapping, in which applications are viewed as a set of tasks that can be run independently on different processors with different capabilities (exact or approximate). We evaluated our proposed method on AxE and reached to a 32% average execution speed-up and 21% energy consumption saving with an average of 99.3% accuracy on three mixed workloads. We also ran a sample image processing application, namely gray-scale filter, on AxE and will present its results. |
Towards Skin Cancer Self-Monitoring through an Optimized MobileNet with Coordinate Attention ABSTRACT. Skin cancer is the most frequent type of cancer, of which there are two types: melanoma and non-melanoma. Melanoma is the least common, but also the deadliest if left untreated in early stages. Thus, skin cancer monitoring is key for early detection, which could be done with the help of mobile devices and artificial intelligence solutions. In this sense, local deployment is suggested to embrace simplicity and avoid data privacy and security issues. However, current high-performance neural networks are extremely challenging to be deployed in mobile devices due to resource constraint, so lighter but effective models are required to make local deployment possible. In this work, pruning an already light model, such as MobileNetV2, is pursued, combining it with an attention mechanism to enhance the network's capability to learn and compensate for the lack of information that pruning the original architecture might cause. Fine-tuning was applied, using an autoencoder to pre-train the model on the CIFAR100 dataset. Experiments covering four scenarios were carried out using HAM10000 dataset. Promising results were obtained, reaching the best performance using a pruned MobileNetV2 combined with Coordinate Attention mechanism with less than a million parameters in total and up to a 83.93% of accuracy. |
Hardware Support for Predictable Resource Sharing in Virtualized Heterogeneous Multicores ABSTRACT. The lack of a predictable resource sharing in heterogeneous multicore systems leads to the need of a deterministic scheduling for shared resources especially in safety critical applications. As model-based design plays an ever-increasing role in the development of applications in these domains, a parameterizable modelling approach is necessary to handle the complexity and improve the efficiency of the developed system. Moreover, virtualization is considered as one of the main technologies to achieve densely integrated systems with high assurance for safety and security. In this work we propose a scheduling approach that allows a deterministic, segregated and efficient management of requests targeting a shared resource in safety critical multicores and can be used to guarantee a certain quality of service. Based on a formal description of our developed scheduling algorithm we demonstrate the possible parameter sets of the algorithm within the design space. Furthermore we show our evaluation of different scenarios in heterogeneous multicores including their processing latencies. |
Mobile Systems Secure State Management ABSTRACT. Today's mobile devices are equipped with sophisticated chain-of-trust mechanisms, able to successfully mitigate tampering of critical software components. However, this technology, on the one hand, hinders the permanence of malware, thus raising the complexity for developing rootkits. On the other hand, the freedom of the end-user itself is limited. In fact, with all the security features enabled, one could not run any privileged code without it being signed by the Original Equipment Manufacturer (OEM); modifying any component of the root partition would cause a device read error and small modifications could be even rolled back automatically. OEMs typically provide mechanisms to (partially) disable these security features. However, they usually require two conditions: every unlock request must be approved by them, e.g. for warranty implications; secondly, to preserve the device security level, each time a security feature is disabled, the user data must be completely erased. We analyze several bootloader related vulnerabilities which allow to bypass those two requirements by exploiting design and implementation flaws in smartphones from different vendors. We then propose a novel architecture for secure device status storage and management. Our proposal relies only on commodity hardware features, which can be found on most mobile platforms. Furthermore, differently from many commercial implementations, we do not consider the storage device firmware as trusted, this makes our attack surface smaller than all of the examined alternatives. |
SecDec : Secure Decode Stage thanks to masking of instructions with the generated signals ABSTRACT. Physical attacks are becoming a major security issue in IOT applications. One of the main vectors of attacks on processors is the corruption of the execution flow. Fault injections allow the modification of instructions, in particular jumps and branches. The proposed approach is to enforce the instruction path of a RISC V processor. The signals extracted from DECOD stage are used to unmask the following instruction. Whereas all instructions have been previously masked during compilation with the expected mask. We show that this solution has a very low hardware overhead of 3.25% and power consumption of 4.33%. An instruction corruption or a jump will be detected on average in less than 2 cycles after the fault while making disassembling from side-channels leakages becomes more difficult. |
At the social event, we will show you the Canarian culture and its prehispanic origins in the park Mundo Aborigen. The visitors are welcomed by a traditional aboriginal town from an outstanding location, outside of the touristic area. Finally, we will admire the ravine of Fataga, which is part of the Gran Canaria World Biosphere Reserve declared by UNESCO. At 19:00 is the comeback so you will have free time to get ready for Social Dinnner at 20:00h at the Lopesan Villa del Conde Resort & Thalasso.
Social dinner at 19:45 h at the Lopesan Villa del Conde Resort & Thalasso including a traditional Canarian music concert.