Utilizing Pattern Matching Technology for Custom Device Signature Flow
ABSTRACT. In the domain of electronic design automation (EDA), accurate netlisting of custom devices such as inductors is crucial for effective circuit verification. Traditional methods often struggle with the complexity and variability inherent in custom device signature. This paper explores the application of pattern matching technology to streamline the netlisting process, enhancing both accuracy and efficiency. We delve into the specific challenges associated with netlisting custom inductors and demonstrate how pattern matching can address these issues. The benefits of this technology in physical and circuit verification are highlighted, showing its potential to revolutionize EDA workflows.
Energy Harvesting from Ultrasonic Waves Using Piezoceramic Square Shape Transmitter and Receiver Applicable in Biomedical Implanted Devices
ABSTRACT. In this paper, we theoretically analyze the one-dimensional model of a piezoceramic energy harvester that uses piezoelectric transduction in the 3-3 mode to convert ultrasonic pressure waves into electrical energy. Our approach to this problem is new because we did not use impedance approach, nonetheless our solution accounts for loss of acoustic environment. We present a design method that first calculates acoustical strength and then design two parameters of transducer thickness (geometry of the system) and frequency of excitation at given excitation voltage amplitude. Our goal here is to extract maximum power from output load. When this frequency and thickness and associated pressure at given distance between transducer is specified we can plot the output power versus load graphically to obtain the optimum load at which the power became maximum. It should be noted that in this problem we assumed the both sides of transmitter transducer is free to radiate acoustic waves otherwise the boundary conditions for constitutive laws of piezoceramic should be altered accordingly. In this paper for a square shape transducer with a thickness of $2.7 mm$ and side length of $1.46 cm$, and for $R-C$ output load the output power in which its value is $57.6 mW$ became maximum at output resistance of $797 \Omega$. The required acoustic strength is $1033.13 \frac{cm^3}{\mu s}$ and it produce pressure of $68.745 kPa$ in the distance of $10 cm$ at the receiver side.
Asynchronous RISC-V Processor for Embedded Sensors
ABSTRACT. The growth of the market for wearable devices and IoT nodes leads to an increasing demand of sensors, given their number in such devices. Each device uses a highly complex on-board microprocessor to run algorithms, but these processors consume a lot of power. The role of future sensors will be to offload some of the processing from the device to the sensor, thereby reducing the load on the device's processor. Energy efficiency is essential for wearable devices and IoT nodes, as battery life is directly linked to energy consumption, which in turn determines how long the device stays charged - a key concern for manufacturers. Of course, the power consumption of sensors must be minimised, as a device typically has more than one sensor, and each should contribute as little as possible to the overall power consumption. In recent years, processors have increasingly been integrated into sensors to reduce the load on the external processor. A new open source Instruction Set Architecture (ISA), known as RISC-V, is gaining prominence in the embedded microprocessor arena due to its modularity and extensibility. This allows the incorporation of new mathematical operations that speed up algorithms and reduce power consumption by reducing the number of cycles. This paper proposes a state-of-the-art minimal asynchronous RISC-V processor, designed for embedded sensors, that executes algorithms and then enters sleep mode upon completion, with the aim of reducing system power consumption. The asynchronous RISC-V processor, implemented on an ASIC, has a lower start-up peak current than its synchronous equivalent, potentially enabling a simpler Low Dropout Regulator (LDO) design. In our design, the LDO power consumption was halved compared to the synchronous model; a significant saving given that LDO consumption is a remarkable part of the total sensor power consumption.
RISC-V System-On-Chip designed to determine the speed of an object and display data on an OLED screen
ABSTRACT. In the current context, where urbanization and
transportation infrastructure have increased significantly. The
growth of the automobile fleet and the diversification of vehicle
types on the roads have created a complex and often dangerous
environment for drivers, passengers, and pedestrians. To
address this, a Speed Control System has been developed using
Time-of-Flight (ToF) technology. This project aims to design a
RISC-V based System-on-Chip (SoC) capable of determining
the speed of an object at a distance of up to 5 meters and
displaying the processed data on an OLED graphic display. The
proposed system enhances road safety by enabling real-time,
accurate detection if non-motorized vehicle speeds, which is
crucial for implementing preventive measures.
How to implement the Hungarian algorithm into hardware
ABSTRACT. The Hungarian algorithm solves the assignment problem in polynomial time and is used in many applications like, e.g. subcarrier allocation for OFDM. Hardware implementations can satisfy real-time constraints but are difficult to build from the algorithm. In this work we devise a method to translate a state-based version of it into a hardware description suitable for FPGA synthesis. Resulting model reveals that memory layout and communication with logic circuitry is critical and that further optimization is required to achieve competitive results with respect to software.
An Innovative On-Chip Ultra-Low Quiescent Current Energy Management Circuit for Battery-less and Harvester IoT Systems
ABSTRACT. This paper provides an overview of conventional techniques for power management integrated circuits (PMIC) in battery-less applications, with a specific focus on energy harvesting (EH) systems. We thoroughly illustrate the typical structure of PMICs used in these applications, emphasizing energy management (EM) circuits. The paper explores practical methods for EM implementation and discusses the design challenges associated with these systems. Finally, we introduce an ultra-low power EM circuit designed for on-chip applications. Simulations conducted using the 65 nm CMOS process technology demonstrate that the proposed circuit consumes less than 100 nA at a supply voltage of 3 V. This circuit is well-suited for a range of applications that necessitate on-chip implementation and ultra-low quiescent current for the PMIC block.
HLS Synthesis: Practical hit and miss analysis of AES cipher descriptions
ABSTRACT. This article focuses on the study of different descriptions of an AES cipher using the Vitis HLS tool. Three AES cipher designs, differing in their HLS description, were selected to analyse the differences in implementations due to the different ways of describing the behaviour of the same circuit. Simulation, synthesis and verification were performed for the three designs, and the results were compared with a hardware implementation of the AES. The results demonstrated a slight inferiority to the hardware implementation at best. It can be concluded that the high-level design may not be superior to the RTL design, depending on the specific design in question
Low Power Single-Event Latch-up Detector with Embedded Current Sensor
ABSTRACT. Radiation exposure in space environments can cause Single Event Effects in integrated circuits affecting the functionality of CubeSats. This paper presents an innovative solution to detect and mitigate Single Event Latch-Up using a current sensor that optimises power consumption. The proposal is designed to improve efficiency without sacrificing response speed offering a reliable alternative to protect CubeSat electronics in harsh environments.
Hands-On IoT: Implementing MQTT protocols for Sensor Networks
ABSTRACT. The goal of this paper is the presentation of the laboratories carried out in the subject of Communication Networks, a subject that belongs to the Computer Science degree, to improve the knowledge among students of one of the most complex chapters for them: Definition and protocols related with the Internet of Things or IoT . With the inclusion of these laboratories, in the last academic year we have improved the student's knowledge and therefore their qualifications compared to previous years by more than 50%. Qualifications have passed from an average of 3.5 in the last six years to 6.1 in the current year. In this paper we introduce the theoretical concepts of the IoT. We explain one of the most used protocol in IoT: MQTT, a protocol that would be located in the OSI tower just above the transport layer. The main actors involved in this protocol are also well defined: broker, publisher and subscriber as well as the functioning of the communication between them. Subsequently, we show the fundamental characteristics of the laboratories and finally, the conclusions obtained.
RISC-V based Spacewire Node implemented on European Radiation Hardened FPGA Devices
ABSTRACT. This research presents an SoC implementation of a SpaceWire node, consisting of an open 32-bit RISC-V CPU and an HDL SpaceWire IP core on a European Radiation-Hardened SRAM FPGA (NanoXplore) and a Microchip (Microsemi) FLASH-based FPGA. Both designs were implemented and simulated using the commercial design suites provided by each vendor. Verification of the designs was conducted using two evaluation kits, while the validation of the SoC nodes was performed through conformance tests using SpaceWire commercial testing equipment.
SpaceWire is a communication protocol widely adopted in spacecraft for connecting instruments to data processors, mass memory, and control processors. Field-Programmable Gate Arrays (FPGAs) are a popular choice for implementing SpaceWire nodes due to their flexibility in meeting the unique requirements of each program or product.
Comparative Analysis of AI Classification Models for Energy Efficiency on Raspberry Pi IoT Nodes
ABSTRACT. The convergence of artificial intelligence (AI) and the Internet of Things (IoT) offers unprecedented opportunities for smart automation across various sectors. However, high energy demands challenge the widespread adoption of AI-IoT systems, impacting their sustainability and operational costs. This paper analyzes the energy efficiency of several AI clas- sification models on IoT platforms, focusing on random forests, support vector machines, and convolutional neural networks such as VGG16 and MobileNet. These models are tested on three Raspberry Pi models (1B+, 3B+, and 4B) to determine which balances performance with minimal energy use best. Situated in the context of agricultural pest management, our study measures the models’ power consumption and processing efficiency, crucial for deploying eco-friendly pest control solutions. By optimizing AI classifiers for energy, we aim to reduce environmental impacts and enhance the feasibility of AI-IoT systems in resource- limited settings. Situated in the context of agricultural pest management, our study measures the models’ power consumption and processing efficiency, crucial for deploying eco-friendly pest control solutions. By optimizing AI classifiers for energy, we aim to reduce environmental impacts and enhance the feasibility of AI-IoT systems in resource-limited settings.
On-Device Dataset Distillation: The MNIST Use Case
ABSTRACT. In this study, a real-time data distillation technique is explored using the MNIST dataset, achieving a significant up to 90\% reduction in data size. The distillation process is implemented on an M5StickC (M5C+) Plus device, with data exchanged efficiently via MQTT between the device and a Raspberry Pi. A training experiment was conducted, demonstrating that the model's performance was comparable to training on the full dataset. This method effectively addresses computational constraints in embedded systems, enabling streamlined evaluation of dataset reduction critical for maintaining data privacy, preventing model drift, and supporting model customization and federated learning. By significantly reducing data footprint, it minimizes memory usage and accelerates subsequent model training processes. These findings demonstrate the feasibility of optimizing dataset size for efficient deployment on microcontroller-based Internet of Things (IoT) devices, showcasing the integration of MQTT to enhance inter-device communication during data-intensive tasks.
Preliminary analysis on encompassing IoT devices in a heterogeneous environment exploiting Federated Learning
ABSTRACT. Federated Learning (FL) enables collaborative model training without sharing raw data, making it a promising approach for decentralized machine learning. However, integrating heterogeneous devices, especially resource-constrained Internet of Things (IoT) clients such as microcontrollers, poses significant challenges. These devices often have limited power and memory, which can prevent effective communication and model training.
This paper provides a simulation of the FL framework involving IoT clients cooperating to train a neural network (NN) model optimized for the implementation on microcontrollers.
Our primary contribution is demonstrating how to address the limitations due to an heterogeneous environment.
Additionally, to simulate real-world conditions, we introduced training delays and examined their impact on energy consumption, model performance, and system heterogeneity.
In our experiments, we measured critical parameters such as energy consumption, training duration, inference time, and model accuracy. Our findings reveal the trade-offs between resource usage and model performance, thus showing preliminary information on how to incorporate IoT clients into FL systems.
Overall, our study practically demonstrate the deployment of FL in realistic environments with different device capabilities.
SOC Accelerator Communication Overhead: Impact of Cache, Data Size and Communication Handling
ABSTRACT. Lookaside Accelerators (LAAs) hold the potential to significantly enhance system performance by offloading computationally demanding tasks from general-purpose CPUs. However, the communication overhead in moving data between the LAA and the CPU can be a significant limitation and impact anticipated performance, diminishing the benefits of an LAA. This research investigates the complex factors shaping communication overhead and their interplay, LAA's overall performance gain and the impact of key system factors, including cache hierarchy, data sharing and communication handling (interrupt vs. polling). Employing an AMD MPSOC platform, we conduct a detailed analysis of an LAA to understand the above highlighted factors for various offload data sizes offloaded by an embedded CPU (ARM) core onto an embedded LAA. Our experimental study is highlighting the importance of data locality, the impact of shared memory access patterns, and the trade-offs associated with different cache configurations. Additionally, this research reveals the general structure of communication overhead while providing practical recommendations for designing and optimizing LAAs, ultimately enabling substantial performance improvements in SoCs and MPSOCs with embedded offload accelerator technology.
A PLL-Based Self-Clocked ECG Data Acquisition Front-End
ABSTRACT. The use of remote and wearable cardiovascular monitoring devices is becoming widespread in everyday life. However, power consumption poses a challenge, limiting the devices' autonomy, weight, and miniaturization. The proposed ECG data acquisition front-end synchronizes the ECG sampling clock with a multiple of the input heart rate and utilizes the first derivative of the ECG signal to adjust the sampling frequency to the rapid changes associated with different phases of the heart’s electrical activity. This method reduces power consumption and decreases the volume of captured data, with minimal loss of accuracy.
AISAD: An AI-Assisted Tool for the High-Level Design of Sigma-Delta Modulators
ABSTRACT. This paper presents a toolbox for the optimization and automated design of Analog-to-Digital Converters (ADCs). The tool combines Artificial Neural Networks (ANNs) and behavioral simulation for the optimized high-level sizing process, i.e. to map system-level specifications into building-block requirements. To this end, ANNs are trained to identify the best ADC archi- tecture for a given set of specifications as well as the optimum set of design parameters which yield to these specifications with the minimum power consumption. Trained ANNs are iterated with time-domain behavioral simulation in order to find the best figure of merit. The presented toolbox – applied to Sigma- Delta Modulators (Σ∆Ms) – includes a Graphical User Interface (GUI) implemented in MATLAB, which guides the user through all steps: from definition of specifications to verification of the obtained results.
Optimizing rectangular patch antenna design using machine learning techniques
ABSTRACT. In recent years, the evolution of wireless communication technologies has needed the development of more efficient and compact antenna designs. This paper presents a novel approach to optimizing rectangular patch microstrip antennas using machine learning (ML) techniques. Traditional optimization methods, which rely heavily on computationally intensive electromagnetic (EM) simulations, are often time-consuming and inefficient. To address these challenges, artificial neural networks (ANNs) are used to predict antenna dimensions and performance parameters, significantly accelerating the design process. This methodology involves generating a comprehensive dataset through EM simulations using Pathwave Advanced Design System (ADS), capturing key parameters such as resonant frequency, width, length, reflection coefficient ($S_{11}$), and gain. Two ANN models are developed: the first predicts the antenna's width and length based on the resonant frequency, while the second predicts the $S_{11}$ and the gain from these dimensions. Experimental results demonstrate high accuracy of the ANN predictions, with minimal discrepancies compared to EM simulations. This approach not only enhances design efficiency but also maintains high precision, showcasing the potential of integrating machine learning into antenna design optimization.
A MQTT-based infrastructure to support Cooperative Online Learning Activities
ABSTRACT. Teaching the processes of designing digital electronic systems is becoming an increasingly challenging task. Design methodologies and tools have evolved to cope with the evergrowing complexity and density, raising the abstraction level of the source design far away from the logic circuit. However, it is of paramount importance that fresh students start by understanding the fundamental concepts of Boolean algebra, design, and optimization of combinational and sequential gatelevel circuits, before moving to higher abstract concepts and tools.
For this, hands-on practice with simple real digital circuits is essential to understanding the essentials of the operation of digital circuits and how digital data is propagated and transformed from block to block. In this paper we present a distributed infrastructure based on the network protocol MQTT to support the deployment of distributed digital systems built with parts located in different physical locations. Thus, promoting the implementation of collaborative online learning/teaching activities will be one of our main goals. Experimental results show latencies between remote sites in the range of a few tens of milliseconds, which is acceptable for running simple digital systems at low speeds, which is necessary for being perceived and understanded by people.
Cloud-Edge Continuum Infrastructure for Reconfigurable Multi-Accelerator Systems
ABSTRACT. In the cloud-edge continuum, distributed computing resources must be viewed as a whole from the user's perspective. This requires transparently virtualizing the underlying hardware to allow moving and scaling user applications across different computing resources. This can be particularly challenging when using reconfigurable systems due to their need to directly access the hardware underneath.
This paper presents an infrastructure to integrate these platforms in the cloud-edge continuum, allowing for the seamless deployment of user applications throughout its different layers. The infrastructure employs Kubernetes as a microservice-based solution to manage user applications across the continuum, and the ARTICo³ framework, extended to PCIe-based platforms in this work, to accelerate parallel sections of the target applications in hardware. As a result, the proposed infrastructure can be used to accelerate any user application on any FPGA-based device in the continuum. This infrastructure could also potentially exploit multi-tenant computing, where computing resources are shared among users, maximizing resource utilization. The benefits of the proposed solution, including virtualization, portability, and scalability, have been validated through an actual cloud-edge continuum implementation running the MachSuite benchmarks, inducing a worst-case overhead of 1.23% when compared against independent single node scenarios.
A Verilog-A superconducting qubit model for cosimulation with control and readout systems in the Cadence Analog Environment
ABSTRACT. This paper introduces two Verilog-A blocks implemented in the Cadence Analog Environment as testbenches for integrating qubits with the electronics necessary for qubit state manipulation and measurement in circuit-level simulations. These blocks specifically model the control and readout mechanisms of transmon qubits, a type of superconducting qubit, by using key transmon and circuit parameters as inputs. The first block models the XY qubit control mechanism, simulating qubit state manipulation by a microwave driving pulse with adjustable shape, amplitude, and duration. It outputs the qubit state probability and the pulse parameters needed for arbitrary qubit rotations. The second block models the dispersive shift readout mechanism, providing the transient response of the qubit to a reading pulse. This block can determine the qubit state based on the pulse's amplitude and phase and accounts for noise from the amplification stages. Simulation results using realistic transmon parameters validate the functionality of these Verilog-A blocks.
Do Artificial Intelligences Dream of Electric Lambs? An Integrated Multi-Sensor System for AI-Driven Lamb Topology Recognition
ABSTRACT. The purpose of this paper is to provide a multi-sensor system and acquisition dataset methodology for creating a depth dataset of lambs that includes images and characteristic parameters. A GUI for manual labelling is implemented to easily package images and metadata and send them to a custom server. This work will enable the creation of artificial intelligence models for lamb topology recognition. Furthermore, the system has been validated in a real environment with more than 600 lambs.
Designing DNNs for a trade-off between robustness and processing performance in embedded devices
ABSTRACT. Machine learning-based embedded systems employed in safety-critical applications such as aerospace and autonomous driving need to be robust against perturbations produced by soft errors.
Soft errors are an increasing concern in modern digital processors since smaller transistor geometries and lower voltages give electronic devices a higher sensitivity to background radiation.
The resilience of deep neural networks (DNN) models to perturbations in their parameters is determined, to a large extent, by the structure of the model itself, and also by the selected numerical representation and used arithmetic precision.
When compression techniques such as model pruning and model quantization are applied to reduce memory footprint and computational complexity for deployment, both model structure and numerical representation are modified and thus, soft error robustness also changes.
In this sense, although the choice of activation functions (AF) in DNN models is frequently ignored, it conditions not only their accuracy and trainability, but also compressibility rates and numerical robustness.
This paper investigates the suitability of using bounded AFs to improve model robustness against DNN parameter perturbations, assessing at the same time the impact of this choice on deployment in terms of model accuracy, compressibility, and computational burden.
In particular, we analyze encoder-decoder DNN models aimed at performing semantic segmentation tasks on hyperspectral images for scene understanding in autonomous driving.
Deployment characterization is performed experimentally on a AMD-Xilinx's KV260 SoM.
BiDSRS: Resource Efficient Real Time Bidirectional Super Resolution System for FPGAs
ABSTRACT. The burgeoning field of computer vision applications and visual display devices has propelled the exploration of super resolution (SR) techniques, drawing attention from both academia and industry. Field programmable gate arrays (FPGAs) have emerged as a favored platform for implementing intricate algorithms, given their energy efficiency and parallel computing capabilities. However, the demand for real-time SR systems employing deep learning poses challenges due to their resource-intensive nature, particularly when targeting edge or resource-constrained devices which creates a pressing need for energy and resource-efficient SR solutions. Besides, conventional SR methods predominantly focus only on upscaling or downscaling images or videos within a system. To bridge this gap, this paper proposes BiDSRS, a resource-efficient real-time SR system tailored for FPGAs, utilizing a modified bicubic interpolation method. In addition, BiDSRS facilitates scaling in both directions, allowing for upscaling and downscaling of images or videos. Evaluation conducted on the Xilinx ZCU 102 FPGA board demonstrates significant resource savings, achieving a reduction of 46x LUT, 31x BRAM, and 41x DSP utilization compared to state-of-the-art DNN-based SR systems, albeit with a throughput trade-off of 0.25x. Similarly, compared to leading algorithm-based SR systems, BiDSRS achieves savings of 23x LUT, 5x BRAM, and 4x DSP resources, albeit with a throughput trade-off of 0.5x. Despite a reduced throughput of 4K@30 Frame Per Second (FPS), BiDSRS substantially decreases FPGA resource utilization for video SR tasks, offering support for sustainable and energy-efficient computing systems.
A Serial Low-Switching FFT Architecture Specifically Tailored for Low Power Consumption
ABSTRACT. This paper presents a new FFT architecture called serial low-switching FFT, which has been specifically conceived for low power consumption. To achieve this, the architecture has been designed with the aim of reducing the switching activity of the components of the architecture. Experimental results show that this approach reduces the power consumption by approximately one third with respect to equivalent FFT architectures in the literature.
A Hardware-Efficient 1200-point FFT Architecture that Combines the Prime Factor and Cooley-Tukey Algorithms
ABSTRACT. In this paper, we present a hardware-efficient 1200-point single-path delay feedback (SDF) fast Fourier transform (FFT) architecture. Contrary to previous FFT architectures for non-power-of-two (NP2) sizes, which usually requires a large number of hardware resources, the proposed approach reduces significantly the rotators that are required in the architecture. This is achieved by combining the prime-factor and Cooley-Tukey algorithms. As a result, the proposed architecture has only one non-trivial rotator in between stages of the architecture.
The effectiveness of this optimization is demonstrated through experimental results, where the proposed approach achieves a significant improvement with respect to previous 1200-point FFTs in terms of hardware resources. Furthermore, the number of resources used in the proposed FFT and in previous optimized 1024-point FFTs is comparable. This fact is highly relevant, since NP2 have traditionally been much less efficient than power-of-two (P2) FFTs. Thus, with this paper we break this old paradigm, making it possible to achieve with NP2 sizes similar efficiency as with P2 ones.
ABSTRACT. This paper presents an FPGA-based 32-parallel 1 million-point fast Fourier transform (FFT) architecture. The proposed implementation achieves a throughput of 12.8 GSps and a latency of 84.9 $\mu$s becoming the fastest 1 million-point FFT in literature that fits in a single FPGA. Additionally, the modular design based on computing several sub-FFTs simplifies the architecture and facilitates the design, implementation, debugging, and maintainance of the proposed circuit.