DCIS2025: 40TH CONFERENCE ON DESIGN OF CIRCUITS AND INTEGRATED SYSTEMS
PROGRAM FOR WEDNESDAY, NOVEMBER 26TH
Days:
next day
all days

View: session overviewtalk overview

09:00-10:30 Session 2

Keynote: Roger Espasa

10:30-11:00Coffee Break & Posters
11:00-12:30 Session 3A: Hardware and Software for RISC-V
11:00
A proof-of-concept ASIC RISC-V based SoC for Industrial Applications

ABSTRACT. This work presents SoC4cris_p1, a proof-of-concept ASIC implementation of a RISC-V –based System-on-Chip (SoC) tailored for Industrial Internet of Things (IIoT) applications. Built on the neorv32 RISC-V HDL core, SoC4cris_p1 prioritizes execution safety and deterministic behavior. Key enhancements include a redesigned memory map for improved internal memory usage, an extended boot system, optimized peripherals, and integration of four experimental IPs from XXX centers to support IIoT-specific functions. The paper covers architectural modifications, front-end FPGA prototyping, back-end ASIC design using 65nm UMC technology, and validation using a custom test board.

11:30
Performance Analysis of Convolution Function for IA Edge Computing Acceleration using a 32-bit RISC-V CPU Implementation

ABSTRACT. This work presents a Convolution IP coprocessor (CIP) for RISC-V CPUs to accelerate convolution operation between the 2x2 kernel and the 2x2 portion of the input layer. This processing is the basis of many artificial vision algorithms required in the industry. Thanks to the RISC-V open instruction set and flexible architecture, it is feasible to explore different ways to accelerate heavy computations that nowadays impose relevant drawbacks to deploying some advanced vision processing on resourceconstrained semiconductor devices widely used in industrial applications. The CIP has been included in a custom SoC RISCV- based prototype named SoC4cris, which is implemented on FPGA and 65nm UDSM technologies. The CIP is connected to the CPU via the AXI4-Lite external bus interface in the SoC. It has been implemented using fully combinational logic to achieve the maximum computing power throughput. The CIP performance is compared to the CPU performance in terms of the number of clock cycles needed to fulfill a complete convolution layer computation task. The same test is executed with and without enabling specific RISC-V extensions for integer and floating point arithmetic to analyze the performance versus hardware resource trade-off. The results of our performance analysis demonstrate the exceptional efficiency of the CIP coprocessor. It significantly improves computation performance by a factor of six, all while introducing no considerable overhead in terms of hardware resources. This underscores the robustness and reliability of our design.

12:00
RISCV-SLIC: Rust Software Level Interrupt Controller for RISCV microcontrollers

ABSTRACT. Rust crate for enabling vectored handling of software interrupts for RISC-V targets inspired by PLIC.

11:00-12:30 Session 3B: Sensing
11:00
An Improved Discrete Time Amplifier-Less Potentiostat Architecture for Metabolic Sensing Applications

ABSTRACT. In this paper, an improved discrete-time potentiostat architecture is proposed. The focus is placed on the digital controller, aiming to enhance its performances, reduce measurement uncertainty and increase speed. The system is modeled including its non linear behavior, and the impact of the digital controller parameters on the system response is analyzed through behavioral simulations. The developed mathematical model shows good agreement with the simulation results. The new controller successfully reduces the measurement uncertainty compared to a purely integral control system by up to a factor of 100 in the worst-case scenario. The system speed is also improved by approximately 40%. The new digital controller not only improves the overall performance of the system, but also does not increase significantly the hardware complexity of the system.

11:30
Learning to Sense Sustainably: RL-Based Control for Solar-Powered IoT Nodes

ABSTRACT. Operating energy-harvesting IoT nodes in real-world, non-stationary environments poses critical challenges due to unpredictable energy availability and event-driven sensing needs. To address this, we propose a reinforcement learning-based approach using Proximal Policy Optimization (PPO) and an LightGBM-powered solar energy forecasting model to dynamically control sensor activation and system wake intervals. Our method maintains the battery level within an optimal range, balancing system longevity and sensing performance, by using a multi-objective reward function that integrates sensing coverage, energy usage, and battery health, and the use of predictive solar input to guide policy learning. Results show that PPO agents trained with solar forecasts can achieve near-optimal trade-offs between energy efficiency and sensing fidelity. This approach demonstrates that predictive, learning-based control is a powerful and scalable solution for autonomous energy management in next-generation IoT deployments.

12:00
A built-in CMOS temperature sensor for On-Chip Thermal Monitoring from 0ºC to 100ºC with a 0.137ºC of Innacuracy

ABSTRACT. This work presents an temperature sensor designed in a 22 nm process, using the 22-FDX design kit for this purpose. This sensor utilizes the threshold voltage and carrier mobility dependence on the temperature for voltage conversion, covering a temperature range of 0 ºC to 100 ºC. A key contribution of this work is the exceptionally low error of +0.137 ºC and –0.075 ºC achieved after a simulated one-point calibration. In addition, this device achieves a temperature coefficient of -1.69 mV/ºC in the whole temperature range. It also consumes 23.11 μA from a 0.8 V DC supply and has an estimated die area of 5024.9 μm2 (based on the schematic), of which approximately 5000 μm2 correspond to two MOM capacitors. These capacitors are required for stand-alone measurement, but in real-life scenarios, part of this capacitance corresponds to the load capacitance seen when an ADC is connected to the sensor’s output. This temperature sensor is suitable for high-accuracy on-chip applications, standing out for its trade-off between area, temperature range, power consumption, error and temperature coefficient.

12:30-14:00 Session 4A: Security and Power Systems
12:30
Electromagnetic Side-Channel Attack on a Cloud-Based Fingerprint Recognition System

ABSTRACT. Cloud-based biometric recognition systems have gained widespread adoption across several sectors due to their advantages in terms of cost-efficiency, scalability and performance. In these systems, the raw fingerprint images collected by sensors are transmitted to cloud servers through secure channels. The servers often use SoC-FPGAs to accelerate processing with hardware. Also, they incorporate advanced security measures. However, they are not very concerned about possible side-channel attacks that can retrieve biometric data at operation level. To raise awareness of this problem, this paper presents an electromagnetic (EM) side-channel attack at a PYNQ Z1 board. It is performed while the SoC-FPGA is reading the fingerprint image from the DDR3 memory. Fuzzy-logic-based rules are extracted during a training phase to explain the correlation between the electromagnetic emanations measured and the pixels transmitted. With that rule base, the attacker needs only one EM trace, acquired in less than 44,37 s, to reconstruct the fingerprint image, reaching pixel-wise accuracy of 99.05%.

13:00
Low Entropy Masking Protection Scheme for ASCON Cipher to Counteract Side-Channel Attacks

ABSTRACT. Since NIST selected the ASCON cipher as a finalist in the lightweight cipher competition for constrained environments in February 2023, the cipher has been a focus of researchers, industry and government. On the other hand, hardware-implemented cryptographic algorithms have had to deal with so-called Side-Channel Attacks (SCA) since their emergence in the late 1990s. Although the ASCON algorithm is relatively recent, SCA attacks that breach its security have been proposed in the literature. In this paper, we present a design methodology for a low entropy masking protection scheme in order to raise the ASCON algorithm's security levels against SCA. To evaluate the proposed methodology, the ASCON's permutation has been implemented in an Artix-7 Xilinx FPGA. The implemented design area overhead is 5.45% with respect to the unprotected implementation. A complete ASCON algorithm has been manufactured in a 65nm TSMC ASIC technology. To perform experimental SCA attacks on the ASCON ASIC, a PCB has been designed and manufactured to specifically perform power measurements on the ASIC core.

13:15
A Lightweight AES Peripheral for RISC-V Cores and IoT Applications

ABSTRACT. In this article, we present a lightweight peripheral of the Advanced Encryption Standard (AES) algorithm suitable for its implementation as a memory mapped peripheral in RISC-V cores. The peripheral is based on an 8-bit serial implementation of AES, which achieves a drastic reduction in the time required to encrypt a message with a reduced increase in resource consumption. The peripheral is compared in terms of resource utilization and timing with a software implementation of AES, tinyAES-c, and a hardware implementation that employs a more common 128-bit datapath using the Series-7 FPGA technology of the manufacturer AMD-Xilinx. The results obtained show that the peripheral achieves a speed 71.84 times faster than the software implementation with an increment of 46.37% of the logic necessary to implement the RISC-V core in which it is implemented, the resource consumption of the AES peripheral being only 475 LUTs.

13:45
Electric vehicle emulator for study as a Distributed Energy Resource

ABSTRACT. The integration of electric vehicles (EVs) into smart grids as distributed energy resources (DERs) is a key aspect of modern energy management. This paper presents the development and implementation of an electric vehicle emulator capable of simulating bidirectional energy flows in vehicle-toeverything (V2X) scenarios. The emulator is designed to analyze the impact of EVs on grid stability, demand response, and energy storage management. The system integrates a battery pack, power electronic systems, and a real-time control system to emulate the charging and discharging patterns of an EV under different grid conditions. Experimental results demonstrate the potential of the emulator in testing and optimizing control strategies for V2X applications, enhancing grid flexibility, and supporting the transition to renewable energy sources.

12:30-14:00 Session 4B: Signal Processing and Power Systems
12:30
FPGA Architectures for Reliable Transmission of Pre-Stored Acoustic Signals in Underwater Localization Systems

ABSTRACT. Robust transmission of underwater acoustic signals is essential for the development of Internet of Underwater Things (IoUT) applications such as environmental monitoring, marine exploration, and also for the underwater localization of mobile entities. Underwater localization systems require the emission of pre-coded and modulated acoustic signals, which must be stored in the designed hardware and transmitted with high accuracy and reliability. This work compares two FPGA-based architectures' approaches for managing the reading and emission of acoustic encoded signals stored on a microSD card for transmission in underwater environments. In one hand, a fully hardware implementation by using finite state machines (FSMs) is presented; in the other hand, a soft-core processor to manage SD card access. The comparative analysis of both implementations is focused on performance, resource usage, design complexity, and flexibility. Experimental results shows that both solutions are functional for underwater acoustic applications, highlighting advantages and limitations for the design of underwater embedded systems that requires robust data handling from microSD memory.

13:00
CMOS Micropower Current-Mode Sinh-Domain Filter with Multidecade Tuning

ABSTRACT. A fully-differential CMOS current-mode Sinh companding second order low pass filter is presented. The main advantages of the proposed filter are low supply voltage requirements, low static power consumption and large frequency tuning range. Measurement results of a test chip prototype are presented, showing a frequency tuning range spanning from 50kHz up to 2MHz. For 1.2MHz bandwidth, the circuit achieves a dynamic range of 99.7dB and a power consumption of 45 uW using a supply voltage of 1.5V. The silicon area of the fabricated filter is 0.256mm2.

13:30
Improved Modified Zeta Inverter for Single-Phase Grid-Tied System

ABSTRACT. This presents a comprehensive analysis and evaluation of the Improved Modified Zeta Inverter (IMZI), designed to interface photovoltaic systems with the single-phase utility grid. The IMZI topology is constructed using two Zeta converters operating in continuous conduction mode. The IMZI output current is controlled using a quadratic linear regulator. This paper presents the qualitative and quantitative analysis of the IMZI, including the small-signal modeling and control design. A comparative study involving IMZI and other modified Zeta inverter topologies is also conducted. The feasibility and performance of the IMZI are verified through computational simulations. The results demonstrate that the IMZI injects current into the utility grid with low harmonic distortion and achieves a conversion efficiency close to 92%.

14:00-15:30Lunch
15:30-17:00 Session 5A: AI Circuits and Systems
15:30
Efficient Neural Architectures for Acoustic Monitoring of Livestock

ABSTRACT. This paper evaluates the performance of three convolutional neural network (CNN) architectures, YAMNet, a VGGish-based CNN, and a custom-designed lightweight CNN, for classifying goat vocalizations. All models were assessed under the same experimental conditions, using consistent log-Mel spectrogram representations and a shared data augmentation strategy. The results show that all architectures achieve comparable accuracy levels above 82%, with the VGGish-based CNN reaching the highest performance (82.75%). However, this model also exhibits a high computational cost, requiring approximately 55 million parameters. In contrast, the custom CNN achieves a similar accuracy (82.40%) while using only 406,408 parameters. These findings highlight the effectiveness of compact, application-oriented CNNs, which offer strong classification performance with significantly reduced computational requirements. Such models are particularly suitable for real-time, energy-efficient deployment in resource-constrained environments typical of Precision Livestock Farming (PLF).

16:00
1-D Convolutional Autoencoder for Fetal and Maternal ECG Classification Oriented to Hardware Implementation Acceleration

ABSTRACT. This paper presents a Deep Learning-based method for fetal and maternal heart rate monitoring, specifically designed for efficient hardware implementation. The proposed approach minimizes computational load by eliminating the denoising and filtering stages. The abdominal electrocardiogram (aECG) signal is thus segmented into 100 ms windows, which are processed by a Convolutional Neural Network (CNN) suitable for real-time implementation on edge devices. This design enables low-latency, continuous heart rate monitoring during pregnancy, supporting fetal well-being assessment and early detection of anomalies. In addition to its suitability for real-time applications, the method can detect fetal arrhythmias, thus providing valuable clinical insights during prenatal care.

16:30
Approximate Circuits versus Quantization for Energy Efficient Deep Neural Networks

ABSTRACT. Deep neural networks dominate the landscape of artificial intelligence models and are used in many applications, but their high computational complexity makes executing them in real time very costly in terms of power. Hence, custom energy-efficient accelerators have become a need in some domains. Low-precision integer arithmetic and approximate compute circuits are two popular optimizations often contemplated for saving hardware resources and power consumption. These techniques are usually considered as separate and independent approximations, but in reality they are inextricably linked. In this work, we explore the interaction and trade-off between quantization and approximate circuits in the context of deep neural network acceleration by evaluating several circuits, including approximate multipliers, approximate adders, and combinations thereof, while using different integer precisions. Additionally, we study how approximate multiplier and adder circuits can be combined to further push energy efficiency. We use the YOLOv3 object detection network to assess the accuracy impact of the circuits in a state-of-the-art complex deep learning model. By combining approximate arithmetic circuits with low-precision quantization we are able to generate approximate MAD circuits with over 60% less power consumption and near identical accuracy compared to using only quantization. Nevertheless, we find that quantization plays a dominant role in the resulting energy efficiency, since the best design points are always found at the lowest bit precisions.

15:30-17:00 Session 5B: Neuromorphic Circuits, Systems and Technologies I
15:30
Character Recognition Application of a Neural Circuit Including Lateral Inhibitory Mechanisms

ABSTRACT. Abstract—This article presents the hardware implementation of a neural unit using memristive devices as synapses. To model the neuronal behavior of a nerve, that is, a coherent set of neurons, we introduce two well-known mechanisms from biology: lateral inhibition and threshold adaptation to repetitive excitations. Detailed circuits are shown, along with their behavior based on simulations and a high-level model that enables the study of more complex systems. This model demonstrates the ability to learn and recognize characters, using lateral inhibition as a key dynamizing element.

16:00
A 0.78 TOPS/W 180nm Stochastic Computing-based Neuromorphic Circuit

ABSTRACT. This paper presents the design, fabrication, and evaluation of a Morphological Neural Network (MNN) implemented in a 180nm CMOS technology using a hybrid approach that combines stochastic computing and classical binary arithmetic. The architecture efficiently implements max, min, and product operations using simple logic gates, while additions are handled with approximate binary adders to reduce power and area. The chip was tested under various voltage and frequency conditions, showing a stable classification accuracy of 92.5\% for the MNIST dataset problem and operational limits below 0.81V. The measured energy efficiency reaches 0.779 TOPS/W, matching, even outperforming, other AI accelerators built in more advanced nodes. These results demonstrate that MNNs combined with stochastic logic provide a compact and energy-efficient solution for edge AI applications.

16:30
A Comparative Analysis of Bipolar and Sign-Magnitude Stochastic Computing Approaches in Quantized Neural Networks

ABSTRACT. The use of stochastic computing methods are a promising approach for energy-efficient neural network inference in edge and low-power environments. By representing fixed-point or real-valued data as bitstreams and performing arithmetic using simple logic gates, these methods enable highly compact and fault-tolerant hardware implementations. However, its probabilistic nature introduces challenges related to numerical accuracy, especially under low-precision constraints. This work builds upon prior stochastic computing-aware training by applying a fixed quantization method across multiple bit widths to compare bipolar and sign-magnitude (a.k.a. two-wire bipolar) encoding schemes. We compare equivalent neural network models to evaluate the impact of each approach on accuracy degradation for different weight and activation bit widths, and explore the corresponding hardware implications in neural network inference in terms of estimated FPGA resources and energy efficiency. Our results show that the sign-magnitude SC model achieves accuracy nearly equivalent to fixed-point inference across most configurations, while also reducing latency by half due to shorter bitstreams. In contrast, the bipolar model exhibits greater degradation at lower bit widths. These findings highlight the advantages of sign-magnitude encoding for SC-based inference and motivate future work on hardware implementations and evaluation on more complex architectures and datasets.

17:00-18:30 Session 6A: System-Level Analysis and Exploration
17:00
Three decades of IMSE Neuromorphic Engineering Group

ABSTRACT. In the paper, we will discuss the evolution of the neuromorphic technology since its origins and the beginning of the IMSE neuromorphic group activity in the pioneering CAVIAR project until the present days and its prospective future development. The IMSE neuromorphic group coordinated the pioneering EU-FP5 CAVIAR project which demonstrated the potential of neuromorphic technology for implementing low-power high-speed sensing, computing and actuation systems. Inside the CAVIAR project, the first Dynamic Vision Sensor (DVS), the first spiking convolution CMOS chip, and the first multi-module closed-loop sensing-processing-control-learning were demonstrated. Further developments in the group have included new DVS cameras, new spiking processors, and new CMOS chips and systems combining CMOS neurons with emerging synaptic devices (RRAM and Ferroelectric based memristors) exhibiting biologically plausible spike-time-dependent-plasticity learning rules.

17:30
Full-Integer Spiking Neural Network Inference with RISC-V ISA Extensions for Radar-based Gesture Recognition

ABSTRACT. Spiking neural networks (SNNs) offer energy-efficient alternatives to conventional artificial neural networks (ANNs), making them suitable for real-time inference on resource-constrained edge devices. However, the reliance on floating-point operations (FLOPs) in spiking neuron states and dynamics often limits their applicability on hardware platforms without floating-point support and impacts the inference performance. In this work, we present hardware-aware optimizations to reduce the computational complexity, coupled with a full-integer spiking neural network (SNN) inference solution that eliminates FLOPs for radar-based hand gesture recognition (HGR). Furthermore, with an aim to develop a reduced instruction set computer 5th generation (RISC-V) based SNN accelerator, we present custom extensions to the RISC-V instruction set architecture (ISA). The proposed solution achieves an ≈ 32 times overall speedup compared to the floating-point counterpart, in which ≈ 11.5 times is achieved from the hardware-aware optimizations with full-integer solution, and an extra ≈ 2.8 times from the custom RISC-V ISA extensions. The results highlight the feasibility of a full-integer SNN inference solution for a non-trivial HGR problem, and the potential of the RISC-V based SNN accelerator to enable efficient SNN inference.

18:00
Design Space Exploration of FPGA-Based Spiking Neural Networks for Angle of Arrival Detection

ABSTRACT. This paper presents a comprehensive design-space exploration of Spiking Neural Network (SNN) architectures for Angle-of-Arrival (AoA) estimation, a key challenge in Radio-Frequency (RF) signal processing. While traditional algorithms and conventional Artificial Neural Network (ANN) have been successfully implemented on hardware accelerators, the potential of SNN remains largely untapped. By implementing and testing various compact network architectures on reconfigurable hardware, we analyze the trade-offs between estimation accuracy, resource consumption, performance, and energy efficiency. Our findings reveal that even small-scale SNNs can deliver competitive precision, positioning them as promising candidates for low-power, real-time embedded applications. This work highlights the advantages of adopting neuromorphic computing paradigms in RF systems and opens new avenues for further research.

17:00-18:30 Session 6B: Neuromorphic Circuits, Systems and Technologies II
17:00
Analyzing Linux System Call Variability: Real-Time Patch Impact and System Call Monitoring

ABSTRACT. State-of-the-art safety-critical systems are increasingly integrating advanced functionality that requires high computational power, such as the pedestrian detection required by autonomous vehicles. Consequently, high-performance embedded platforms are becoming increasingly necessary. In this context, the use of Linux is highly attractive to industry due to its extensive ecosystem (platform support, AI libraries, etc.) and its open-source development model. However, Linux was not designed to comply with strict safety standards, which complicates its use in safety-critical systems. Previous works have studied the nondeterminism of Linux kernel system calls regarding their execution paths and execution times, and proposed alternative approaches to justify its use in such systems. In this work, we continue those efforts with two main contributions. First, we compare a regular Linux kernel with the kernel patched with the PREEMPT_RT real-time patch, and show how the patch reduces the variability of system calls, both in terms of timings and execution paths. Then, we propose an additional layer of assurance in the form of a trie-based monitor implemented in hardware, which ensures that the variability measured and estimated during testing holds when the system is fielded, both for execution paths and execution times. We implement a software prototype of the monitor to demonstrate its feasibility and discuss our plan to migrate it to hardware.

17:30
HPC Workload Analysis Using Distributed Cross-ISA Binary Instrumentation

ABSTRACT. Developing distributed High-Performance Computing (HPC) applications is challenging, with complex interactions between application, runtime environment, processing cores, and network to obtain the highest performance that a given distributed computing system can provide. HPC systems are evolving at a fast pace, so applications must often be ported. Generally, developers natively run their applications on current machines and extrapolate the performances on future ones. However, modern and future HPC machines contain multiple nodes, each with multiple general-purpose processor cores, possibly with an Instruction Set Architecture (ISA) different from the previous generations, as well as new domain-specific accelerators, so simple extrapolations may not be accurate. Instead, we propose an automated approach to execute and non-intrusively characterize distributed HPC applications on a QEMU-based, cross-ISA, distributed simulation platform. As part of this automated approach, we propose a QEMU plugin to extract metrics at runtime during the execution of distributed applications.

The approach is demonstrated on a RISC-V-based distributed multi-node architecture. It achieves an average speedup of almost 3.5x on a single host machine with 16 virtual nodes in comparison with a single node. Using QEMU plugins for collecting MPI runtime metrics slows the simulation by 1.62x in average, but overall, our approach remains much faster than other simulation platforms.

18:00
Exploring Design Spaces in Embedded Systems: An Approach Based on Genetic Programming, Particle Swarm and Reinforcement Learning

ABSTRACT. Efficient hardware design for information processing in embedded systems is essential in applications requiring high speed and low energy consumption, such as signal and image processing or edge artificial intelligence. However, these systems face significant constraints in terms of computational resources and energy consumption, especially when operating on battery power. Additionally, manual configuration of hyperparameters to achieve optimal performance is often a lengthy and complex process. This work presents a comparative study of different hyperparameter search and optimization techniques applied to the design of image processing pipelines in embedded devices. The study considers both the quality of the results obtained and the resources used for generating the reconfigurable hardware. This evaluation is key to maximizing the quality of results while minimizing resource consumption on the device. To validate the functionality of the proposed system, experiments were conducted to compare the results obtained by the different techniques against a production-level edge image processing system, using a real-world dataset in the context of smart agriculture. The results showed superior performance of evolutionary approaches—especially our developed algorithm—achieving a good balance between accuracy and resource usage. In contrast, reinforcement methods failed to converge effectively, and particle swarm optimization exhibited exploratory limitations, highlighting the suitability of evolutionary techniques in resource-constrained embedded systems.