A proof-of-concept ASIC RISC-V based SoC for Industrial Applications
ABSTRACT. This work presents SoC4cris_p1, a proof-of-concept ASIC implementation of a RISC-V –based System-on-Chip (SoC) tailored for Industrial Internet of Things (IIoT) applications. Built on the neorv32 RISC-V HDL core, SoC4cris_p1 prioritizes execution safety and deterministic behavior. Key enhancements include a redesigned memory map for improved internal memory usage, an extended boot system, optimized peripherals, and integration of four experimental IPs from XXX centers to support IIoT-specific functions. The paper covers architectural modifications, front-end FPGA prototyping, back-end ASIC design using 65nm UMC technology, and validation using a custom test board.
Performance Analysis of Convolution Function for IA Edge Computing Acceleration using a 32-bit RISC-V CPU Implementation
ABSTRACT. This work presents a Convolution IP coprocessor (CIP) for
RISC-V CPUs to accelerate convolution operation between the
2x2 kernel and the 2x2 portion of the input layer. This processing
is the basis of many artificial vision algorithms required in the
industry. Thanks to the RISC-V open instruction set and flexible
architecture, it is feasible to explore different ways to accelerate
heavy computations that nowadays impose relevant drawbacks
to deploying some advanced vision processing on resourceconstrained
semiconductor devices widely used in industrial
applications. The CIP has been included in a custom SoC RISCV-
based prototype named SoC4cris, which is implemented on
FPGA and 65nm UDSM technologies. The CIP is connected to the
CPU via the AXI4-Lite external bus interface in the SoC. It has
been implemented using fully combinational logic to achieve the
maximum computing power throughput. The CIP performance
is compared to the CPU performance in terms of the number
of clock cycles needed to fulfill a complete convolution layer
computation task. The same test is executed with and without
enabling specific RISC-V extensions for integer and floating point
arithmetic to analyze the performance versus hardware resource
trade-off. The results of our performance analysis demonstrate
the exceptional efficiency of the CIP coprocessor. It significantly
improves computation performance by a factor of six, all while
introducing no considerable overhead in terms of hardware
resources. This underscores the robustness and reliability of our
design.
An Improved Discrete Time Amplifier-Less Potentiostat Architecture for Metabolic Sensing Applications
ABSTRACT. In this paper, an improved discrete-time potentiostat architecture is proposed. The focus is placed on the digital controller, aiming to enhance its performances, reduce measurement uncertainty and increase speed. The system is modeled including its non linear behavior, and the impact of the digital controller parameters on the system response is analyzed through behavioral simulations. The developed mathematical model shows good agreement with the simulation results. The new controller successfully reduces the measurement uncertainty compared to a purely integral control system by up to a factor of 100 in the worst-case scenario. The system speed is also improved by approximately 40%. The new digital controller not only improves the overall performance of the system, but also does not increase significantly the hardware complexity of the system.
Learning to Sense Sustainably: RL-Based Control for Solar-Powered IoT Nodes
ABSTRACT. Operating energy-harvesting IoT nodes in real-world, non-stationary environments poses critical challenges due to unpredictable energy availability and event-driven sensing needs. To address this, we propose a reinforcement learning-based approach using Proximal Policy Optimization (PPO) and an LightGBM-powered solar energy forecasting model to dynamically control sensor activation and system wake intervals. Our method maintains the battery level within an optimal range, balancing system longevity and sensing performance, by using a multi-objective reward function that integrates sensing coverage, energy usage, and battery health, and the use of predictive solar input to guide policy learning. Results show that PPO agents trained with solar forecasts can achieve near-optimal trade-offs between energy efficiency and sensing fidelity. This approach demonstrates that predictive, learning-based control is a powerful and scalable solution for autonomous energy management in next-generation IoT deployments.
A built-in CMOS temperature sensor for On-Chip Thermal Monitoring from 0ºC to 100ºC with a 0.137ºC of Innacuracy
ABSTRACT. This work presents an temperature sensor designed
in a 22 nm process, using the 22-FDX design kit for this purpose.
This sensor utilizes the threshold voltage and carrier mobility
dependence on the temperature for voltage conversion, covering
a temperature range of 0 ºC to 100 ºC. A key contribution of this
work is the exceptionally low error of +0.137 ºC and –0.075 ºC
achieved after a simulated one-point calibration. In addition, this
device achieves a temperature coefficient of -1.69 mV/ºC in the
whole temperature range. It also consumes 23.11 μA from a 0.8
V DC supply and has an estimated die area of 5024.9 μm2 (based
on the schematic), of which approximately 5000 μm2 correspond
to two MOM capacitors. These capacitors are required for
stand-alone measurement, but in real-life scenarios, part of this
capacitance corresponds to the load capacitance seen when an
ADC is connected to the sensor’s output. This temperature
sensor is suitable for high-accuracy on-chip applications, standing
out for its trade-off between area, temperature range, power
consumption, error and temperature coefficient.
Electromagnetic Side-Channel Attack on a Cloud-Based Fingerprint Recognition System
ABSTRACT. Cloud-based biometric recognition systems have gained widespread adoption across several sectors due to their advantages in terms of cost-efficiency, scalability and performance. In these systems, the raw fingerprint images collected by sensors are transmitted to cloud servers through secure channels. The servers often use SoC-FPGAs to accelerate processing with hardware. Also, they incorporate advanced security measures. However, they are not very concerned about possible side-channel attacks that can retrieve biometric data at operation level. To raise awareness of this problem, this paper presents an electromagnetic (EM) side-channel attack at a PYNQ Z1 board. It is performed while the SoC-FPGA is reading the fingerprint image from the DDR3 memory. Fuzzy-logic-based rules are extracted during a training phase to explain the correlation between the electromagnetic emanations measured and the pixels transmitted. With that rule base, the attacker needs only one EM trace, acquired in less than 44,37 s, to reconstruct the fingerprint image, reaching pixel-wise accuracy of 99.05%.
Low Entropy Masking Protection Scheme for ASCON Cipher to Counteract Side-Channel Attacks
ABSTRACT. Since NIST selected the ASCON cipher as a finalist in the lightweight cipher competition for constrained environments in February 2023, the cipher has been a focus of researchers, industry and government. On the other hand, hardware-implemented cryptographic algorithms have had to deal with so-called Side-Channel Attacks (SCA) since their emergence in the late 1990s. Although the ASCON algorithm is relatively recent, SCA attacks that breach its security have been proposed in the literature. In this paper, we present a design methodology for a low entropy masking protection scheme in order to raise the ASCON algorithm's security levels against SCA. To evaluate the proposed methodology, the ASCON's permutation has been implemented in an Artix-7 Xilinx FPGA. The implemented design area overhead is 5.45% with respect to the unprotected implementation. A complete ASCON algorithm has been manufactured in a 65nm TSMC ASIC technology. To perform experimental SCA attacks on the ASCON ASIC, a PCB has been designed and manufactured to specifically perform power measurements on the ASIC core.
A Lightweight AES Peripheral for RISC-V Cores and IoT Applications
ABSTRACT. In this article, we present a lightweight peripheral of the Advanced Encryption Standard (AES) algorithm suitable for its implementation as a memory mapped peripheral in RISC-V cores. The peripheral is based on an 8-bit serial implementation of AES, which achieves a drastic reduction in the time required to encrypt a message with a reduced increase in resource consumption. The peripheral is compared in terms of resource utilization and timing with a software implementation of AES, tinyAES-c, and a hardware implementation that employs a more common 128-bit datapath using the Series-7 FPGA technology of the manufacturer AMD-Xilinx. The results obtained show that the peripheral achieves a speed 71.84 times faster than the software implementation with an increment of 46.37% of the logic necessary to implement the RISC-V core in which it is implemented, the resource consumption of the AES peripheral being only 475 LUTs.
Electric vehicle emulator for study as a Distributed Energy Resource
ABSTRACT. The integration of electric vehicles (EVs) into smart grids as distributed energy resources (DERs) is a key aspect of modern energy management. This paper presents the development and implementation of an electric vehicle emulator capable of simulating bidirectional energy flows in vehicle-toeverything (V2X) scenarios. The emulator is designed to analyze the impact of EVs on grid stability, demand response, and energy storage management. The system integrates a battery pack, power electronic systems, and a real-time control system to emulate the charging and discharging patterns of an EV under different grid conditions. Experimental results demonstrate the potential of the emulator in testing and optimizing control strategies for V2X applications, enhancing grid flexibility, and supporting the transition to renewable energy sources.
FPGA Architectures for Reliable Transmission of Pre-Stored Acoustic Signals in Underwater Localization Systems
ABSTRACT. Robust transmission of underwater acoustic signals is essential for the development of Internet of Underwater Things (IoUT) applications such as environmental monitoring, marine exploration, and also for the underwater localization of mobile entities. Underwater localization systems require the emission of pre-coded and modulated acoustic signals, which must be stored in the designed hardware and transmitted with high accuracy and reliability. This work compares two FPGA-based architectures' approaches for managing the reading and emission of acoustic encoded signals stored on a microSD card for transmission in underwater environments. In one hand, a fully hardware implementation by using finite state machines (FSMs) is presented; in the other hand, a soft-core processor to manage SD card access. The comparative analysis of both implementations is focused on performance, resource usage, design complexity, and flexibility. Experimental results shows that both solutions are functional for underwater acoustic applications, highlighting advantages and limitations for the design of underwater embedded systems that requires robust data handling from microSD memory.
CMOS Micropower Current-Mode Sinh-Domain Filter with Multidecade Tuning
ABSTRACT. A fully-differential CMOS current-mode Sinh companding second order low pass filter is presented. The main advantages of the proposed filter are low supply voltage requirements, low static power consumption and large frequency tuning range. Measurement results of a test chip prototype are presented, showing a frequency tuning range spanning from 50kHz up to 2MHz. For 1.2MHz bandwidth, the circuit achieves a dynamic range of 99.7dB and a power consumption of 45 uW using a supply voltage of 1.5V. The silicon area of the fabricated filter is 0.256mm2.
Improved Modified Zeta Inverter for Single-Phase Grid-Tied System
ABSTRACT. This presents a comprehensive analysis and evaluation of the Improved Modified Zeta Inverter (IMZI), designed to interface photovoltaic systems with the single-phase utility grid. The IMZI topology is constructed using two Zeta converters operating in continuous conduction mode. The IMZI output current is controlled using a quadratic linear regulator. This paper presents the qualitative and quantitative analysis of the IMZI, including the small-signal modeling and control design. A comparative study involving IMZI and other modified Zeta inverter topologies is also conducted. The feasibility and performance of the IMZI are verified through computational simulations. The results demonstrate that the IMZI injects current into the utility grid with low harmonic distortion and achieves a conversion efficiency close to 92%.
Efficient Neural Architectures for Acoustic Monitoring of Livestock
ABSTRACT. This paper evaluates the performance of three convolutional neural network (CNN) architectures, YAMNet, a VGGish-based CNN, and a custom-designed lightweight CNN, for classifying goat vocalizations. All models were assessed under the same experimental conditions, using consistent log-Mel spectrogram representations and a shared data augmentation strategy. The results show that all architectures achieve comparable accuracy levels above 82%, with the VGGish-based CNN reaching the highest performance (82.75%). However, this model also exhibits a high computational cost, requiring approximately 55 million parameters. In contrast, the custom CNN achieves a similar accuracy (82.40%) while using only 406,408 parameters. These findings highlight the effectiveness of compact, application-oriented CNNs, which offer strong classification performance with significantly reduced computational requirements. Such models are particularly suitable for real-time, energy-efficient deployment in resource-constrained environments typical of Precision Livestock Farming (PLF).
1-D Convolutional Autoencoder for Fetal and Maternal ECG Classification Oriented to Hardware Implementation Acceleration
ABSTRACT. This paper presents a Deep Learning-based method for fetal and maternal heart rate monitoring, specifically designed for efficient hardware implementation. The proposed approach minimizes computational load by eliminating the denoising and filtering stages. The abdominal electrocardiogram (aECG) signal is thus segmented into 100 ms windows, which are processed by a Convolutional Neural Network (CNN) suitable for real-time implementation on edge devices. This design enables low-latency, continuous heart rate monitoring during pregnancy, supporting fetal well-being assessment and early detection of anomalies. In addition to its suitability for real-time applications, the method can detect fetal arrhythmias, thus providing valuable clinical insights during prenatal care.
Approximate Circuits versus Quantization for Energy Efficient Deep Neural Networks
ABSTRACT. Deep neural networks dominate the landscape of artificial intelligence models and are used in many applications, but their high computational complexity makes executing them in real time very costly in terms of power. Hence, custom energy-efficient accelerators have become a need in some domains. Low-precision integer arithmetic and approximate compute circuits
are two popular optimizations often contemplated for saving hardware resources and power consumption. These techniques are usually considered as separate and independent approximations, but in reality they are inextricably linked. In this work, we explore the interaction and trade-off between quantization and approximate circuits in the context of deep neural network
acceleration by evaluating several circuits, including approximate multipliers, approximate adders, and combinations thereof, while using different integer precisions. Additionally, we study how approximate multiplier and adder circuits can be combined to further push energy efficiency. We use the YOLOv3 object detection network to assess the accuracy impact of the circuits in a state-of-the-art complex deep learning model. By combining approximate arithmetic circuits with low-precision quantization we are able to generate approximate MAD circuits with over 60% less power consumption and near identical accuracy compared to using only quantization. Nevertheless, we find that quantization plays a dominant role in the resulting energy efficiency, since the best design points are always found at the lowest bit precisions.
Character Recognition Application of a Neural Circuit Including Lateral Inhibitory Mechanisms
ABSTRACT. Abstract—This article presents the hardware implementation
of a neural unit using memristive devices as synapses. To
model the neuronal behavior of a nerve, that is, a coherent
set of neurons, we introduce two well-known mechanisms from
biology: lateral inhibition and threshold adaptation to repetitive
excitations. Detailed circuits are shown, along with their behavior
based on simulations and a high-level model that enables the
study of more complex systems. This model demonstrates the
ability to learn and recognize characters, using lateral inhibition
as a key dynamizing element.
A 0.78 TOPS/W 180nm Stochastic Computing-based Neuromorphic Circuit
ABSTRACT. This paper presents the design, fabrication, and evaluation of a Morphological Neural Network (MNN) implemented in a 180nm CMOS technology using a hybrid approach that combines stochastic computing and classical binary arithmetic. The architecture efficiently implements max, min, and product operations using simple logic gates, while additions are handled with approximate binary adders to reduce power and area. The chip was tested under various voltage and frequency conditions, showing a stable classification accuracy of 92.5\% for the MNIST dataset problem and operational limits below 0.81V. The measured energy efficiency reaches 0.779 TOPS/W, matching, even outperforming, other AI accelerators built in more advanced nodes. These results demonstrate that MNNs combined with stochastic logic provide a compact and energy-efficient solution for edge AI applications.
A Comparative Analysis of Bipolar and Sign-Magnitude Stochastic Computing Approaches in Quantized Neural Networks
ABSTRACT. The use of stochastic computing methods are a promising approach for energy-efficient neural network inference in edge and low-power environments. By representing fixed-point or real-valued data as bitstreams and performing arithmetic using simple logic gates, these methods enable highly compact and fault-tolerant hardware implementations. However, its probabilistic nature introduces challenges related to numerical accuracy, especially under low-precision constraints. This work builds upon prior stochastic computing-aware training by applying a fixed quantization method across multiple bit widths to compare bipolar and sign-magnitude (a.k.a. two-wire bipolar) encoding schemes. We compare equivalent neural network models to evaluate the impact of each approach on accuracy degradation for different weight and activation bit widths, and explore the corresponding hardware implications in neural network inference in terms of estimated FPGA resources and energy efficiency. Our results show that the sign-magnitude SC model achieves accuracy nearly equivalent to fixed-point inference across most configurations, while also reducing latency by half due to shorter bitstreams. In contrast, the bipolar model exhibits greater degradation at lower bit widths. These findings highlight the advantages of sign-magnitude encoding for SC-based inference and motivate future work on hardware implementations and evaluation on more complex architectures and datasets.
Three decades of IMSE Neuromorphic Engineering Group
ABSTRACT. In the paper, we will discuss the evolution of the neuromorphic technology since its origins and the beginning of the IMSE neuromorphic group activity in the pioneering CAVIAR project until the present days and its prospective future development. The IMSE neuromorphic group coordinated the pioneering EU-FP5 CAVIAR project which demonstrated the potential of neuromorphic technology for implementing low-power high-speed sensing, computing and actuation systems. Inside the CAVIAR project, the first Dynamic Vision Sensor (DVS), the first spiking convolution CMOS chip, and the first multi-module closed-loop sensing-processing-control-learning were demonstrated. Further developments in the group have included new DVS cameras, new spiking processors, and new CMOS chips and systems combining CMOS neurons with emerging synaptic devices (RRAM and Ferroelectric based memristors) exhibiting biologically plausible spike-time-dependent-plasticity learning rules.
Full-Integer Spiking Neural Network Inference with RISC-V ISA Extensions for Radar-based Gesture Recognition
ABSTRACT. Spiking neural networks (SNNs) offer energy-efficient alternatives to conventional artificial neural networks (ANNs), making them suitable for real-time inference on
resource-constrained edge devices. However, the reliance on floating-point operations (FLOPs) in spiking neuron states and dynamics often limits their applicability on hardware platforms without floating-point support and impacts the inference performance. In this work, we present hardware-aware optimizations to
reduce the computational complexity, coupled with a full-integer spiking neural network (SNN) inference solution that eliminates FLOPs for radar-based hand gesture recognition (HGR). Furthermore, with an aim to develop a reduced instruction
set computer 5th generation (RISC-V) based SNN accelerator, we present custom extensions to the RISC-V instruction set architecture (ISA). The proposed solution achieves an ≈ 32 times overall speedup compared to the floating-point counterpart,
in which ≈ 11.5 times is achieved from the hardware-aware optimizations with full-integer solution, and an extra ≈ 2.8 times from the custom RISC-V ISA extensions. The results highlight the feasibility of a full-integer SNN inference solution for a non-trivial HGR problem, and the potential of the RISC-V based SNN accelerator to enable efficient SNN inference.
Design Space Exploration of FPGA-Based Spiking Neural Networks for Angle of Arrival Detection
ABSTRACT. This paper presents a comprehensive design-space exploration of Spiking Neural Network (SNN) architectures for Angle-of-Arrival (AoA) estimation, a key challenge in Radio-Frequency (RF) signal processing. While traditional algorithms and conventional Artificial Neural Network (ANN) have been successfully implemented on hardware accelerators, the potential of SNN remains largely untapped. By implementing and testing various compact network architectures on reconfigurable hardware, we analyze the trade-offs between estimation accuracy, resource consumption, performance, and energy efficiency. Our findings reveal that even small-scale SNNs can deliver competitive precision, positioning them as promising candidates for low-power, real-time embedded applications. This work highlights the advantages of adopting neuromorphic computing paradigms in RF systems and opens new avenues for further research.
Analyzing Linux System Call Variability: Real-Time Patch Impact and System Call Monitoring
ABSTRACT. State-of-the-art safety-critical systems are increasingly integrating advanced functionality that requires high computational power, such as the pedestrian detection required by autonomous vehicles. Consequently, high-performance embedded platforms are becoming increasingly necessary. In this context, the use of Linux is highly attractive to industry due to its extensive ecosystem (platform support, AI libraries, etc.) and its open-source development model. However, Linux was not designed to comply with strict safety standards, which complicates its use in safety-critical systems. Previous works have studied the nondeterminism of Linux kernel system calls regarding their execution paths and execution times, and proposed alternative approaches to justify its use in such systems. In this work, we continue those efforts with two main contributions. First, we compare a regular Linux kernel with the kernel patched with the PREEMPT_RT real-time patch, and show how the patch reduces the variability of system calls, both in terms of timings and execution paths. Then, we propose an additional layer of assurance in the form of a trie-based monitor implemented in hardware, which ensures that the variability measured and estimated during testing holds when the system is fielded, both for execution paths and execution times. We implement a software prototype of the monitor to demonstrate its feasibility and discuss our plan to migrate it to hardware.
HPC Workload Analysis Using Distributed Cross-ISA Binary Instrumentation
ABSTRACT. Developing distributed High-Performance Computing (HPC) applications is challenging, with complex interactions between application, runtime environment, processing cores, and network to obtain the highest performance that a given distributed computing system can provide.
HPC systems are evolving at a fast pace, so applications must often be ported.
Generally, developers natively run their applications on current machines and extrapolate the performances on future ones.
However, modern and future HPC machines contain multiple nodes, each with multiple general-purpose processor cores, possibly with an Instruction Set Architecture (ISA) different from the previous generations, as well as new domain-specific accelerators, so simple extrapolations may not be accurate.
Instead, we propose an automated approach to execute and non-intrusively characterize distributed HPC applications on a QEMU-based, cross-ISA, distributed simulation platform.
As part of this automated approach, we propose a QEMU plugin to extract metrics at runtime during the execution of distributed applications.
The approach is demonstrated on a RISC-V-based distributed multi-node architecture.
It achieves an average speedup of almost 3.5x on a single host machine with 16 virtual nodes in comparison with a single node.
Using QEMU plugins for collecting MPI runtime metrics slows the simulation by 1.62x in average, but overall, our approach remains much faster than other simulation platforms.
Exploring Design Spaces in Embedded Systems: An Approach Based on Genetic Programming, Particle Swarm and Reinforcement Learning
ABSTRACT. Efficient hardware design for information processing in embedded systems is essential in applications requiring high speed and low energy consumption, such as signal and image processing or edge artificial intelligence. However, these systems face significant constraints in terms of computational resources and energy consumption, especially when operating on battery power. Additionally, manual configuration of hyperparameters to achieve optimal performance is often a lengthy and complex process.
This work presents a comparative study of different hyperparameter search and optimization techniques applied to the design of image processing pipelines in embedded devices. The study considers both the quality of the results obtained and the resources used for generating the reconfigurable hardware. This evaluation is key to maximizing the quality of results while minimizing resource consumption on the device.
To validate the functionality of the proposed system, experiments were conducted to compare the results obtained by the different techniques against a production-level edge image processing system, using a real-world dataset in the context of smart agriculture.
The results showed superior performance of evolutionary approaches—especially our developed algorithm—achieving a good balance between accuracy and resource usage. In contrast, reinforcement methods failed to converge effectively, and particle swarm optimization exhibited exploratory limitations, highlighting the suitability of evolutionary techniques in resource-constrained embedded systems.