View: session overviewtalk overview
IEEE CAS Distinguished Lecturer: Sudipto Chakraborty
Title: Ultra-low Power Cryo-CMOS Designs for Next-Generation Quantum Computing
This talk will cover practical challenges for cryogenic CMOS designs for next generation quantum computing. Starting from system level, it will detail the design considerations for a non-multiplexed, semi-autonomous, transmon qubit state controller (QSC) implemented in 14nm CMOS FinFET technology. The QSC includes an augmented general-purpose digital processor that supports waveform generation and phase rotation operations combined with a low power current-mode single sideband upconversion I/Q mixer-based RF arbitrary waveform generator (AWG). Implemented in 14nm CMOS FinFET technology, the QSC generates control signals in its target 4.5GHz to 5.5 GHz frequency range, achieving an SFDR > 50dB for a signal bandwidth of 500MHz. With the controller operating in the 4K stage of a cryostat and connected to a transmon qubit in the cryostat’s millikelvin stage, measured transmon T1 and T2 coherence times were 75.7μs and 73μs, respectively, in each case comparable to results achieved using conventional room temperature controls. In further tests with transmons, a qubit-limited error rate of 7.76x10-4 per Clifford gate is achieved, again comparable to results achieved using room temperature controls. The QSC’s maximum RF output power is -18 dBm, and power dissipation per qubit under active control is 23mW.
10:30 | Design of GFET-based active modulators leveraging device performance reproducibility conditions PRESENTER: Anibal Pacheco-Sanchez ABSTRACT. High-data rate modulators, namely phase-shift keying (PSK) and quadrature (Q) PSK, are designed in this work based on an experimentally-calibrated compact graphene field-effect transistor (GFET) model. The circuits consist of a one single device for the PSK design and of two GFETs for the QPSK proposal. In order to diminish the unavoidable impact of traps on this emerging transistor technology, the bias conditions for the circuit operation corresponds to a trap-reduced measurement scenario where reproducible device characteristics have been experimentally obtained. This proposal exploits the device reproducibility and the multifunctional circuit capabilities of GFET technology. Hence, the outstanding inherent characteristics of GFETs in dynamic operation can be used in a high-performance application by controlling the trap-states without additional costly technology solutions. |
10:50 | Devices and circuits for HF applications based on 2D materials PRESENTER: Henri Happy ABSTRACT. Graphene and related 2D materials have been widely studied in recent years. As a result, many components have been developed to fulfill numerous applications, in the field of high frequency electronics and telecommunications. This is the case of graphene field effect transistors (GFET) which can be used to realize amplification functions and splitters, photodetectors for optical mixing. Recently, 2D materials haves also shown their ability to be used as analog switches to transmit very high frequency signal,, based on their ultimate thickness. This paper will describe some routes that have been used to fabricate these devices and circuits, as well as the characterization of their performance. |
11:10 | On the influence of technological parameters on the expected performance of GFET-based mixers. PRESENTER: Mari Carmen Pardo Martínez ABSTRACT. Ambipolar conductance in Graphene Field-Effect Transistors (GFETs), and in particular their quasi-quadratic transfer characteristic, makes these devices excellent candidates for exploiting subharmonic mixing. Several realizations have already demonstrated the ability of GFETs to compete with or even improve state-of-the-art mixers based on traditional semiconductor materials. Nonetheless, a systematic analysis of the influence of GFET technology on the resulting performance of mixer circuits is still lacking. In this work, we provide an in-depth evaluation of the impact of both geometrical (e.g., device length, insulator thickness) and technological (e.g., mobility, contact resistance, etc.) parameters on the performance at a circuit level. To that end, we employ a commercial RF circuit simulator together with a large-signal compact model developed by some of the authors. Our results provide design guidelines to be considered for future implementations of GFET-based mixers. |
10:30 | Experimental Demonstration of Associative Memory in Coupled Differential Oscillator Networks PRESENTER: Juan Núñez ABSTRACT. The utilization of phase-transition materials-based nano-oscillators is being investigated to apply various non-traditional computing paradigms. Specifically, vanadium dioxide (VO2) devices are used to design self-sustained non-linear oscillators that can be employed for oscillatory neural networks (ONNs). In addition, in these ONN architectures sub-harmonic injection locking (SHIL) can be exploited to ensure that each neuron's phase information can only adopt one of two possible values. An integrated circuit demonstrator of an analog 9-neuron ONN using a deep-submicron commercial technology have been designed and fabricated. The oscillators forming the neurons closely resemble those designed using VO2 devices. The capability of the fabricated ONN to work as an associative memory has been tested. An example of two store patterns has been used to show that the ONN successfully stores the two patterns and exhibits the associative memory functionality. |
10:50 | Three-Stage Low Dropout Regulator with Enhanced Transient Response and Regulation Performance PRESENTER: Belen Calvo-López ABSTRACT. This paper presents a novel on-chip capacitor-less LDO regulator design that addresses the challenges posed by the absence of a bulky output capacitor. The proposed LDO regulator features a two-stage error amplifier with high gain, achieving enhanced regulation performance. Compensation is based on the Reverse Nested Miller approach, and a class AB topology is employed to improve transient performance. Additionally, a Time Response Enhancement Circuit (TREC) is included to optimize the transient response with minimal size-power trade-off. Simulation results demonstrate the regulator's ability to provide a regulated output voltage of 1.8V with a 1.9-3V input supply, low dropout voltage of 100mV, and excellent transient performance. This paper shows that the proposed LDO regulator design achieves a competitive performance compared to existing state-of-the-art solutions. |
11:10 | Super Class AB Amplifiers: A Unifying Approach PRESENTER: Antonio Lopez-Martin ABSTRACT. A general super class AB OTA based on Local Common-Mode Feedback is presented and analyzed. It allows design flexibility by providing design parameters that can be tailored to a given application. Based on this general architecture, it is shown how power efficiency of the OTA can be optimized. Two particular realizations of this general OTA have been fabricated in a 130 nm CMOS process. Measurement results validate the proposed approach. |
10:30 | Design of SoC FPGA based controller to reduce shadow effects in photovoltaic installations PRESENTER: Pedro Pérez Carballo ABSTRACT. This paper proposes the implementation of a dynamic reconfiguration algorithm of a photovoltaic array, together with a maximum power point tracker. The controller will maximize the output power of a photovoltaic array under partial shading conditions. Also, a System-On-Chip in the Loop design flow that integrates MATLAB/Simulink and Vivado Design Suite tools is introduced. The designed control system is prototyped on the ZedBoard development board. Verification of the implemented system on the ZedBoard is performed in conjunction with the system model in Simulink. The implemented system is scalable, has low resource utilization and power consumption. |
10:50 | Integrated Cuk Inverter for Single-Phase Grid-Tied Photovoltaic System PRESENTER: Leonardo Sampaio ABSTRACT. In this paper, a novel integrated converter designed to step-up the photovoltaic (PV) array voltage, and also to inject the extracted PV array energy into the single-phase AC mains is presented. The DC-AC topology is deployed using two modified Cuk converters, named as Integrated Cuk Inverter (ICI). The ICI output current is controlled employing the hysteresis current control. The maximum power point of the PV array is tracked down by using the perturb and observe technique. Therefore, the presented ICI topology is proposed to be used in the replacement of the traditional double stage PV system. Moreover, the injected current into the grid presents low total harmonic distortion. From simulation results, the feasibility of the proposed system, as well as the control technique are validated. |
SMART CORROSION MONITORING BASED ON NON-DESTRUCTIVE TESTINGS (NDTs) PRESENTER: Upeksha Chathurani ABSTRACT. Offshore renewable energy is seeing rapid growth in recent years. For long-term operation, the life span of the installed offshore structures is crucial. By deteriorating the materials and their qualities, corrosion is one of the main reasons for offshore structural failure. This causes a reduction in the useful service life of the materials and leads to failing the structures, equipment, and other objects from their intended functioning. Therefore, a methodical corrosion monitoring system would be beneficial in effective corrosion management with early warnings of failures and leads to lower operational and maintenance costs. Nowadays, corrosion monitoring solutions are growing towards real-time data acquisition, multi-parameter monitoring, intelligent and smart sensing, remote data logging systems, and further which can be permanently installed in the field. Corrosion monitoring is beneficial in several ways such as providing early warnings of damages in the mean of corrosion-induced failures, studying the correlation between process parameters and system corrosivity, identification of cause and corrosion controlling parameters such as pressure, temperature, pH, flow rate, etc., evaluation the efficacy of the corrosion prevention/control technique and facilitation of the scheduling of maintenance tasks as necessary. Different non-destructive testing (NDT) techniques have been used to detect corrosion under-coatings and produce quantitative corrosion measures in an effort to improve corrosion detection. They are typically categorized as inspection devices that carry out corrosion measures less frequently. As well, it is essential to test how well these systems perform over time for a particular application under its own environmental challenges. Despite that, unsolved and unresolved challenges are still there to work in relation to corrosion detection under-coatings using NDT techniques, obtaining quantitative accurate measures for a long time, type of corrosion to detect (pitting, uniform, fatigue, etc.), unattended system operation at low power with cost minimization. This research work involves developing and testing such a corrosion monitoring system for steel wind turbine constructions and concrete reinforcing bars operating in challenging offshore environments. |
FPGA-based Acceleration of AI Structures for RIS Applications PRESENTER: Ruben Padial ABSTRACT. 3D reconfigurable intelligent surfaces (RIS) development for smart and energy-sustainable wireless communications. 3D-RIS are configured and optimized with Artificial Intelligent (AI) algorithms which are accelerated using reconfigurable digital hardware devices. AI algorithms are optimized for FPGA target device making use of innovative High Level Synthesis (HLS) tools as well as HDL with the aim of real-time acceleration. These activities belong to a multi-disciplinary project and lead to a close development of AI algorithms in order to reach a more efficient HW implementation. The goal is having a RIS demonstrator controlled by AI accelerated in a FPGA device and integrate it in the whole system. |
Oceanographic Profiler for Long-Term Measurements of Energy and Tidal Currents ABSTRACT. A novel oceanographic instrument, founded on tilt by drag, for obtaining a tidal-current profile (tidal-current-meter) was developed. The ultra-low power embedded intelligence enables the long-term (>180 days) modelling & characterization of tides in offshore areas. The instrument records tilts over long-term time periods for making an oceanographic atlas comprising all semidiurnal and diurnal tidal-currents constituents, as well as wind-driven inertial currents. The primary goal of the tidal-current-meter is to develop a tool for analyzing the frequency variability of ocean dynamics. The instrument consists of a system of systems where several linear array of tidal-current sensors are deployed at the seabed along the area under study. The linear array of tidal-current sensors are built by stacking a set of tidal-current devices (see Fig. 2). The spatial-vertical resolution is 10 cm, and the sampling-frequency is 12.5 Hz. The tidal-current sensor was calibrated in a swing chamber equipped with ultrasonic current meters. |
Maximally Digital VCO-based ADCs with Programmable Pulse Shaping Filters ABSTRACT. This work focuses on why is necessary the analog blocks digitalization, how my PhD thesis can help to achieve more linear VCOs based in inverters Ring Oscillators and more efficient digital filters without a conventional FIR structure. This will make more scalable and efficient architectures maintaining the analog circuits specifications capability. |
Inertial-Instrument for Monitoring Offshore-Cage Mooring-Line Dynamics ABSTRACT. A novel instrument based on acceleration measurements is deployed for obtaining the mooring-line dynamics of offshore aquaculture cages. A set of instruments as a sensor network were deployed at different depths of the main mooring line of the aquaculture Aquanaria S.L. facility in Gran Canaria, (Spain, Atlantic Ocean). The proposed instrument records accelerations and computes the fundamental frequencies of the resulting forces. Long-term (one week) draft forces have been analyzed in the time domain, and the prototype has been tested for 189 days at a depth in the range of 15 − 20 meters using a lithium-ion battery. The resolution of the instrument in terms of draft velocity is 0.01 cm/s for a velocity of 0.8 m/s. The inertial instrument is a tool that can be used to analyze the high-frequency variability of ocean dynamics. Linear acceleration samples are processed on board (edge computing) for calculating amplitudes and frequencies of the resulting force fundamental-components and using integer arithmetic. Cloud computing provides stress information of the mooring-line for offline monitoring. |
SpeedEdge – Acceleration microarchitectures for Edge applications PRESENTER: Gonzalo Salinas ABSTRACT. This PhD thesis aims to provide a novel acceleration solution for AI and other state-of-the-art workloads to cover the technology gap in the transition of such workloads from the cloud to the edge. The proposed solution is a flexible and customizable framework for acceleration of fundamental optimized operations for the aforementioned workloads on IoT devices. The framework can run several types of AI models, and hardware design space exploration can be done according to the specific model parameters. The methodology involves the study of fundamental mathematical operations of state-of-the-art Edge workloads such as AI, design space exploration of hardware accelerators, architecture definition, prototyping on FPGA, generalization of the accelerator, integration on FAFEC (Framework for Acceleration of Fundamental Edge Computations). Finally, the acceleration hardware will be integrated in a RISC-V SoC. The expected results and impact include support for the A-IQ Ready European project (Horizon 2020), industrial PhD publications in journals (CEI & UPM), possible patents for novel industrial IP based on the acceleration framework (NVISION), and open-source contributions (RISC-V). |
Unleashing the Potential of 2D Material Devices in Nonlinear RF Circuits ABSTRACT. This thesis project aims to develop multiscale simulation tools for the design and optimization of RF circuits based on flexible 2D material technologies. The devices under study use its I-V nonlinearity as working principle, enabling a variety of applications such as frequency conversion and rectification. After the development of the necessary physical, numerical and compact models, high efficient with low intermodulation distortion mixers and rectifiers will be designed and optimized experimentally. |
FPGA-Based Acceleration for Emerging Neuromorphic Computing Paradigms ABSTRACT. Spiking neural networks (SNNs) promise to perform tasks currently done by classical artificial neural networks (ANNs) faster, in smaller footprints and using less energy. This is yet to be proved in many applications, so in order to gain more insight into these new information processing paradigms, researchers need tools to simulate their networks. There are several CPU and GPU-based implementations of SNNs but these can be complex and large, and their heavily parallel nature make big simulations a challenge, so there is an increasing need for acceleration. Luckily, FPGA devices are stellar candidates for this as their distributed architecture is very well suited to spiking neuron implementation. However, in order to have efficient hardware, network designers have to deal with implementing their own circuits. This work aims to overcome that hurdle with the development of an SNN-to-FPGA framework, where we generate FPGA-based hardware accelerators from high level representations of SNNs, which are written in a network description language like PyNN. It uses flexible processing cores with optimized hardware circuits to implement a number of supported neuron models (e.g. LIF), and is able to generate whole networks and wrap them in RISC-V compatible peripherals to be run on FPGA devices. |
Real-time embedded eye detection and analysis framework PRESENTER: Camilo Andres Ruiz Beltran ABSTRACT. The information contained within person’s eyes is essential to many systems such as iris recognition, drowsiness detection, gaze estimation among others. Furthermore, if those systems are intended to process the images in real time it can be achieved by means of hardware implementation of the system. This can help to achieve constraints such as small size, portability, high speed, low power consumption and many more. We propose and build a prototype using a System On Chip with a logic side in which algorithms are accelerated and with a software side to control all the peripherals and connectivity achieving Real-time eye detection (full logic design) and high image processing throughput |
Run-Time ML-Based Modeling and Management of Reconfigurable Multi-Accelerator Systems with Virtualization Support ABSTRACT. Reconfigurable multi-accelerator systems working in \acs{iot} environments deal with dynamically changing workloads and operating conditions. It is therefore crucial to optimize them to take advantage of their execution performance while complying with power consumption constraints and application requirements. This thesis proposes 1) the use of machine learning techniques to model power consumption and performance in reconfigurable multi-accelerator systems at run-time, and 2) to apply smart decision-making algorithms for resource management at node level and across the edge-cloud continuum. |
Time-encoded audio MEMS sensors optimized for edge computing PRESENTER: Michele Noviello ABSTRACT. Much research showed in the last years that Voltage-Controlled-Oscillator(VCO)-based Sigma Delta ADCs are a promising solution with the purpose of reducing chip area and power consumption. One of their applications is the readout of MEMS capacitive sensors, which usually require biasing circuitry (a charge pump and a high-ohmic). The biasing circuitry introduces a high bias voltage on the chip and is not easily scalable to lower technology nodes. In this PHD, innovative solutions to remove the biasing circuitry are studied. The most promising is a "MEMS in the loop" architecture, that takes advantage of Time-Encoded techniques to read the sensor, convert its information in the oscillation frequency of a VCO and this output is processed solely by digital electronics. A first prototype was implemented and the results prove the architecture effectiveness. |
Configurable electrical stimulation system for cardiac tissue samples ABSTRACT. The aim of this work is the development of tools to improve the management of cardiovascular diseases through ex vivo heart research. For cardiovascular disease assessment, cardiac tissue sample are long-term cultured, which changes their properties. To prevent this dedifferentiation, electro-mechanical stimulation is applied. Due to electrical stimulation, a load dependent voltage artifact is generated, which can hinder the measurement process. To avoid the artifact effects, a stimulation system and stimulation and recording electrode array are designed. |
Towards highly efficient audio processing: New Analog-to-Digital and Analog-to-Information architectures for Edge devices ABSTRACT. In recent years, audio interfaces have become more usual not only for phones and computers, but also for devices like TV remotes, home automation and truly-wireless earphones. Being battery-powered, these devices require highly efficient circuits for their always-on operation. However, even though digital processors have become more efficient thanks to Moore’s Law, analog processing presents a challenge when trying to implement the same architectures in lower lithography nodes. In this paradigm, my thesis research presents two proposals to tackle these issues, making more efficient ADCs and analog-to-information architectures for Edge devices. |
12:10 | A 5.2-GS/s 8-Parallel 1024-Point MDC FFT PRESENTER: Pedro Paz ABSTRACT. This paper presents an efficient 8-parallel 1024-point multi-path delay conmutator (MDC) fast Fourier transform (FFT) implementation on a field-programmable gate array (FPGA). The selection of the FFT algorithm and the data orders allows to obtain an architecture with 23 non-trivial rotators, which is the minimum number achieved so far. Additionally, the non-general rotators in the architecture are trivial rotators, constant rotators and 1-rots, which require very few resources to be implemented. The deep pipelining in the architecture allows to obtain a throughput of 5.2 GS/s. |
12:30 | Any-Radix Efficient Parallel Implementation of the Fast Fourier Transform on FPGAs PRESENTER: Juan Antonio López Martín ABSTRACT. This paper presents the results of a thorough investigation of arbitrary and mixed-radix architectures of the Fast Fourier Transform (FFT) for their efficient implementation on high performance FPGA devices. We have developed a novel recursive approach that provides the hardware description of the logic architecture of the FFT using parameterized complex matrix multipliers. This approach allows to easily and efficiently generate parallel implementations an arbitrary number of points. In addition, the proposed architecture accepts large amounts of pipeline to meet very high timing requirements. The largest implementation provided surpasses the 100 GS/s throughput threshold. |
12:50 | An Automatic Generator of Non-Power-of-Two SDF FFT Architectures for 5G and Beyond PRESENTER: Víctor Manuel Bautista Loza ABSTRACT. This paper presents an automatic generator for non-power-of-two (NP2) single-path delay feedback (SDF) fast Fourier transform (FFT) architectures. Previous generators support sizes that are powers of two. Conversely, the sizes that are considered in the proposed generator consider products of powers of 2, 3 and 5, which are the sizes reported in the physical layer description of 5G. The proposed approach not only includes the software tool, but also utilizes hardware optimization techniques to optimize the architectures. This makes the generated code more understandable for the user. In order to measure the capabilities of the proposed generator, FFTs of a wide range of sizes have been implemented and reported, achieving high performance capabilities. |
13:10 | High-Throughput DTW accelerator with minimum area in AMD FPGA by HLS PRESENTER: Javier Hormigo ABSTRACT. Dynamic Time Warping (DTW) is a dynamic programming algorithm that is known to be one of the best methods to measure the similarities between two signals, even if there are variations in the speed of those. It is extensively used in many machine learning algorithms, especially for pattern recognition and classification. Unfortunately, it has a quadratic complexity, which results in very high computational costs. Furthermore, its data dependency made it also very difficult to parallelize. Special attention has been paid to computing DTW on the edge, as a way to reduce the load of communication on Internet-of-Thing applications. In this work, we propose a minimum area implementation of the DTW algorithm in AMD FPGAs with optimal use of the resources. That is achieved by maximizing the use time of the resources and taking advantage of the inner structure of the AMD FPGAs. This architecture could be used in small devices or as a base for a multi-core implementation with very high throughput. |
12:10 | UML-Based Design Flow for Systems with Neural Networks PRESENTER: Hector Posadas ABSTRACT. Artificial intellienge has demonstrated its ability to solve lots of critical tasks, but at the cost of high computational requirements. Different devices, as CPUs, GPUs and FPGAs, have been proposed to provide this computational power, each one with its benefits and drawbacks. However, the exploration of the different alternatives in an easy an integrated way is still a complex task. To solve so, this paper proposes a UML-based design flow where neural networks are initially specified and then automatically generated and trained using TensorFlow. The approach also enables automatic mapping of models to CPU, GPU and FPGAs, using Xilinx's Deep Learning Processor Units (DPUs). The framework also generates the communication codes required to connect the other system components with the implementation selected. This approach addresses design-space exploration challenges, system architecture definition, and improves implementation and training processes by saving time and effort. |
12:30 | Analog/Mixed-Signal Standard Cell Based Approach for Automated Circuit Generation of Neural Network Accelerators PRESENTER: Roland Müller ABSTRACT. Analog and mixed-signal neural network accelerators are a promising solution to apply deep learning methods to edge applications where high energy and area efficiency are required. Such in-memory computing implementations use regular and repetitive circuit structures that take great advantage of design automation. An analog/mixed-signal standard cell design approach in combination with an automation framework has been developed to ease the design of such systems. The framework discussed here provides the basic functionality such as schematic and layout creation. It is based on manually designed standard cells and technology and topology parameters to steer the automation. The presented methodology drastically reduces the (re-)design time and engineering effort leading to a reduced time-to-market whilst errors occurring in manual executed circuit design can be avoided. |
12:50 | Machine learning infrastructure for managing a electric vehicle fleet using a cyber-physical system framework PRESENTER: Pedro Blanco-Carmona ABSTRACT. The purpose of this paper is to provide machine learning algorithms and the cloud infrastructure for a cyber-physical system for managing batteries in a fleet of electric vehicles. The paper proposes dynamic algorithms for the diagnosis and control of these batteries. The system could re-train and update them, which prevents the aging of the algorithms and validates them over the lifetime of the battery. Furthermore, the system has been validated in a laboratory test bank. |
13:10 | Approximate arithmetic aware training for stochastic computing neural networks PRESENTER: Christiam Franco Frasser ABSTRACT. Deploying modern neural networks on resource-constrained edge devices requires a series of optimizations to prepare them for production. These optimizations typically involve pruning, quantization, and fixed-point conversion to compress the model size and improve energy efficiency. While these optimizations are generally sufficient for most edge devices, there is potential for further improving energy efficiency by leveraging special-purpose hardware and unconventional computing paradigms. In this work, we investigate stochastic computing neural networks and their impact on quantization and overall performance with respect to weight distributions. When arithmetic operations such as addition and multiplication are performed by stochastic computing hardware, the arithmetic error may increase significantly, resulting in reduced overall accuracy. To bridge the accuracy gap between a fixed-point model and its stochastic computing implementation, we propose a new approximate arithmetic aware training method. We demonstrate the effectiveness of our approach by implementing the LeNet-5 convolutional neural network on an FPGA. Our experimental results show a negligible accuracy degradation of only 0.01\% compared to the floating-point outcome, while achieving a significant 27x speedup and 33x improvement in energy efficiency compared to other FPGA implementations. Moreover, our proposed method increases the likelihood of selecting optimum LFSR seeds for SC systems. |
12:10 | Automatic code generation from UML for data memory optimization in microcontrollers PRESENTER: Hector Posadas ABSTRACT. Design of applications for microcontrollers is typically con-strained by the limited hardware capabilities of this devices. As embedded systems, the specificities of each application should be analyzed to overcome these limitations, but this is not easy to do. To help in this process, this paper proposes an analysis of some communication semantics, its potential impact on data memory usage and alternatives to minimize it. Moreover, a design tool capable of automatically generate code for micro-controllers from UML models is proposed. That way, engineers can automatically generate implementations from the commu-nication semantics specified in the UML model. That way, exploration of the design alternatives can be done with minimal recoding effort. |
12:30 | Low-power EEGNet-based Brain-Computer Interface implemented on an Arduino Nano 33 Sense PRESENTER: Daniel Enériz ABSTRACT. The use of Convolutional Neural Networks (CNNs) to process Electroencephalograph (EEG) signals has been introduced in recent years with great success in the field of Brain-Computer Interfaces (BCI). Nevertheless, in order to advance towards a CNN-based BCI prototype, they must be efficiently mapped into low-power and low-cost hardware, enabling a real-time, portable and Internet-independent brain-computer communication. This work presents the implementation of an EEGNet-based model into an ARM Cortex M4F microcontroller, available on the Arduino Nano 33 Sense. Starting from models trained over the Physionet Motor Movement/Imagery dataset, 8-bit integer post-training quantization has been considered to reduce computing complexity, with a mean downgrade of 2.64±0.77% in accuracy. Moreover, their computational impact and memory footprint has been characterized by measuring the associated operations and Random-Access Memory (RAM) usage. Finally, a selected model has been implemented on the ARM Cortex M4F, with a latency of 137 ms and an energy per inference of 2.55 mJ, a 40% lower than other EEGNet implementation on the same microcontroller. |
12:50 | Evaluating the soft error sensitivity of LU decomposition on low-power and high-performance GPUs PRESENTER: Jose A. Belloch ABSTRACT. GPUs have become and essential component of many embedded systems and also of the nodes in the top supercomputer centers. Due to their large memories and state-of-the-art technology they are particular prone to transient errors. It is therefore worthwhile to analyze their error sensitivity and to design mitigation error techniques, and also fault tolerant algorithms for this kind of device. In this paper we evaluate the soft-error sensitivity of two versions of LU decomposition in two very different GPUs, namely a GPU included in a low-power SoC device, and a high-end massively parallel GPU. We perform injection campaigns in both GPUs and study the vulnerability of the algorithms, the causes of the errors, and also which are the critical components of the code that we should better protect. Experiments show that single bit-flips in the results of one instruction of the code produce errors in most cases, but that a good use of the GPU resources can increase the number of masked faults. They also show that different sections of the code have a quite different behaviour regarding the number of errors and also their propagation to many elements of the result matrix. |
13:10 | Accelerators in Embedded Systems for Machine Learning: A RISCV View PRESENTER: Alejandra Sanchez-Flores ABSTRACT. Embedded systems, which are mobile systems with IoT and constrained processing capability, have mostly functioned as a depiction of ubiquitous computing throughout the previous ten years. Innovative applications, on the other hand, are expanding, consuming more data, and employing ever-complicated algorithms. To run complex algorithms, accelerators have been added into the normal architecture of embedded systems made for convolutional neural networks (CNN), a machine learning technique. Energy economy, adaptability, versatility, and heterogeneity are some characteristics that the new devices that handle ML in embedded applications must have. However, ML algorithms cannot be executed on embedded systems because to a lack of computational capacity. To compute CNN model inference, accelerators are recommended and integrated into the embedded system utilizing the same energy and resource-saving methods. The appeal of accelerators based on the RISC-V architecture is the possibility to deliberately alter the host processor to improve communication with the accelerator. This work seeks a balanced design between acceleration and flexibility to deploy several CNN model configurations. |