Program for Friday, February 28th

PROGRAM FOR FRIDAY, FEBRUARY 28TH

Days:

previous day

all days

View: session overview talk overview

09:00-10:30 Session 13A: Arithmetic and Signal Processing Circuits and Systems

Chair:

Mateus Grellert (Federal University of Rio Grande do Sul, Brazil)

Location: Malbec A+B

09:00	Shiva Nejati (University of Windsor, Canada) Moslem Heidarpur (University of Windsor, Canada) Mitra Mirhassani (University of Windsor, Canada) A Novel and Efficient Large Integer Number Theoretic Transform Multiplier Based on Unified Blocks ABSTRACT. This work presents a novel integer multiplication architecture based on the Number Theoretic Transform (NTT) for Post Quantum Cryptography. The proposed NTT introduces an overlapping execution of NTT stages, enabling subsequent stages to commence as soon as the necessary operands from the preceding stage are available. This approach significantly reduces the number of clock cycles required for the NTT operation, leading to better latency. Furthermore, a unified block is proposed and developed, which enhances memory management for the execution of the NTT operations and results in reduced required resources for the implementation. The FPGA implementation on Virtex-7 series for 512-bit integer multiplier demonstrates a substantial reduction in both delay and area by 93.75% and 98%, respectively, defying the typical trade-off between these two metrics.
09:18	Carson Sager (Oklahoma State University, United States) James Stine (Oklahoma State University, United States) Design of a Robust IEEE Compliant Floating-Point Divide and Square Root using Iterative Approximation ABSTRACT. In this paper, we discuss an IEEE 754 compliant normalized floating-point divide and square root unit that utilizes iterative approximation. We provide a robust architecture that allows multiple formats and all IEEE 754 rounding modes while still exhibiting high-performance. Moreover, we also adhere to the IEEE 754 2019 standard and demonstrate methods for rounding results to all five rounding modes using iterative approximation. Performance, Power, and Area estimates are determined from physical synthesis using ARM-based standard cells in a TSMC 28nm process. This paper also presents comparisons versus other implementations and demonstrates the efficient of the approach presented here.
09:36	Matheus Lemos (UFRGS, Brazil) Clayton Farias (UFRGS, Brazil) Paulo Butzen (UFRGS, Brazil) José Azambuja (UFRGS, Brazil) Improving Circuit Area with a 7nm Predictive FinFET PDK Multi-Height Standard Cell Library ABSTRACT. Moore’s Law predicts a doubling of transistors every two years, driving semiconductor innovation. To meet this challenge, FinFET technology offers enhanced current control and higher transistor density. This work introduces a 7 nm multi-height standard cell library using FinFETs, which enhances design flexibility by allowing different cell heights. We designed 13 cells, including D-type flip-flops and 2:1 multiplexer, with up to 50% area reduction compared to a 6-track library. Preliminary results show area reductions of up to 36% in benchmarks, with promising electrical performance despite incomplete parasitic characterization.
09:54	Adriane Duarte (Cefet/RJ, Brazil) Amaro Lima (Cefet/RJ, Brazil) Michel Tcheou (UERJ, Brazil) Maria Medeiros (University of Coimbra, Portugal) Vinícius Silva (UFF, Brazil) Felipe Henriques (Cefet/RJ, Brazil) Material Classification using Optical Wireless Communications Data ABSTRACT. This study proposes the integration of Optical Wireless Communication (OWC) and the classification of the material type of the object whose distance to the laser is being measured, using Machine Learning (ML) techniques such as KNN, RF, and SVM. The application relies on using the OWC dynamic communication structure between vehicles to estimate relevant information in order to improve system communication and navigation. The aim of this work is to classify materials such as glass, plastic and aluminum in order to improve the functionality of the OWC system by expanding the sensory information without adding new hardware, taking advantage of the structure of the communication system already in use. The methodology employs different ML techniques, along with approaches for dealing with limited and unbalanced amounts of data, k-fold and SMOTE, in order to perform a comparative analysis and obtain an efficient classification model. The most significant impact of this study lies in offering an integrated solution that optimizes optical communication while providing material classification for selective information processing. This approach offers a comparative analysis of the ML techniques applied, obtaining an accuracy of 93% using KNN and SMOTE.

09:00-10:30 Session 13B: Special Session - Approximate and Transprecision Computing: the Quest of More with Less

Chairs:

Carlos Silva Cardenas (Pontificia Universidad Católica del Perú, Peru)
Jorge Castro-Godínez (Costa Rica Institute of Technology, Costa Rica)

Location: Assemblage

09:00	Guilherme Dias (Escola Politécnica, Universidade de São Paulo. INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Portugal) Luís Crespo (INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Portugal) Pedro Tomas (INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Portugal) Nuno Roma (INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Portugal) Nuno Neves (INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Portugal) Dynamic Reconfigurable FPU for Next-Generation Transprecision Computing ABSTRACT. Recent limitations in technology scaling have emphasized the need for energy-efficient computing architectures that can dynamically adjust operand precision based on the real-time requirements of the application, without compromising result accuracy. This adaptability also creates opportunities to enhance throughput and optimize hardware utilization. Furthermore, the use of lower-precision formats (e.g., 16-bit) often releases portions of the arithmetic datapath, allowing these resources to be reallocated for increased vector parallelism. In this context, we present a novel transprecision Floating-Point Unit (FPU) that supports all IEEE 754 data types (double, single, half-precision), as well as the bfloat16 and DLFloat formats. The unit features dynamic precision tuning of operands, enabling increased throughput through vectorization and improved energy efficiency. The proposed design was implemented using a 28nm UMC technology process, achieving a peak energy efficiency of 152 GOPS/W as a result of new precision adaptation capabilities.
09:18	Pedro Silva (Universidade Federal de Santa Catarina, Brazil) Rita Louro (Universidade Federal de Santa Catarina, Brazil) Mateus Grellert (Universidade Federal do Rio Grande do Sul, Brazil) Cristina Meinhardt (Universidade Federal de Santa Catarina, Brazil) Power-Efficient Design of Approximate Parallel-Tree Digital Comparators ABSTRACT. With the rise of decision-making circuits in smart edge devices, the demand for energy-efficient Digital Magnitude Comparators has significantly increased. In this work, we propose a method for approximating tree-based comparators to improve energy efficiency and performance. Compared to state-of-the- art exact full-custom and gate-level counterparts, our proposed comparator shows superior power metrics, reducing up to 21% on total power dissipation, at the penalty of 0.59% error rates. In a decision tree learning case study, integrating these approximate comparators yields up to 25% reduction in the power dissipation from comparison operations, highlighting the practical relevance of our approach.
09:36	Rafael dos Santos Ferreira (Federal University of Pelotas (UFPel), Brazil) Luciano Agostini (Federal University of Pelotas (UFPel), Brazil) Cláudio Diniz (Federal University of Rio Grande do Sul (UFRGS), Brazil) Bruno Zatt (Federal University of Pelotas (UFPel), Brazil) Applying Approximate Subtractors for Power Reduction in TZS ABSTRACT. This paper investigates the impact of imprecise subtractors in the hardware architecture of the Sum of Absolute Differences (SAD) computation within the Test Zone Search (TZS) algorithm, commonly used in Versatile Video Coding (VVC). Four state-of-the-art imprecise subtractors (AppS, AXSC1, AXSC2, AXSC3) were analyzed across various video resolutions to assess their influence on computational complexity, energy consumption, and coding efficiency. The results show that subtractors like AppS4 and AXCS14 provide significant reductions in energy consumption with minimal impact on video coding quality. These findings are especially relevant for low-power devices and embedded systems, where energy efficiency is critical. The use of imprecise subtractors offers a promising trade-off between computational efficiency and energy savings, making them a viable solution for high-performance video encoders.
09:54	Vinicius Zanandrea (UFSC, Brazil) Jorge Castro-Godínez (Instituto Tecnológico de Costa Rica, Costa Rica) Cristina Meinhardt (UFSC, Brazil) Ultra-compact Approximate 4:2 Compressor on the Design of Power-efficient Multipliers for Image Multiplication ABSTRACT. This work proposes the adoption of an energy-efficient approximate 4:2 compressor at the transistor level for designing a Dadda tree approximate multiplier. The proposal is also evaluated considering eight other state-of-the-art approximate compressors. Our MAX4:2CV2-based multiplier proposal reduces delay by up to 50.4%, power consumption by up to 59.2%, and Power-Delay Product (PDP) by up to 79.7% compared to an exact multiplier.Furthermore, we evaluate the multiplier quality through pixel-by-pixel image multiplication, where we observe an acceptable result of 31 dB on average for the Peak Signal-to-Noise Ratio (PSNR). These findings highlight that the adoption of the proposed compressor can improve the efficiency of approximate multiplier designs, especially when area and power savings are a critical factor.
10:12	João Bedin (UFRGS, Brazil) Pedro Pereira (UFRGS, Brazil) Rodrigo Wuerdig (KU Leuven / UFRGS, Brazil) Eduardo Da Costa (UCpel, Brazil) Sergio Bampi (UFRGS, Brazil) Quality-Reconfigurable Approximate Multiplier Utilizing Select Leading One-Bit Blocks ABSTRACT. Multipliers are essential for emerging technologies, as they are vital arithmetic circuits in many energy-efficient ap- plications, such as digital signal processing and machine learning applications. Approximate multipliers (AxM) became an optimal option for applications in ASIC systems with error tolerance. This paper proposes to develop a run-time reconfigurable AxM with four approximation levels in a single circuit, evaluating accuracy and circuit metrics (e.g., circuit area, timing, and power consumption) based on the leading one-bit-based approximate (LoBA) multiplier. The results achieve a circuit area reduction of 52% less area and up to 27% less power consumption when compared with equivalent architecture based on LoBA state-of- art. Applying the proposed RLoBA to a normalized least mean square (NLMS) adaptive filter case study, we obtain 22.95% less power consumption using dynamic approximate level selection then the precise multiplier while maintaining the same accuracy level.

09:00-10:30 Session 13C: Amplifiers, Multipliers and Rectifiers

Chair:

Gilles Jacquemod (University of Nice - Sophia Antipolis, France)

Location: Malbec C

09:00	Riccardo Della Sala (Sapienza, Italy) Orazio Aiello (University of Genoa, Italy) Giuseppe Scotti (Sapienza University of Rome, Italy) A Novel 0.62 nW Inverter based Digital-OTA ABSTRACT. In this paper we present a novel ultra-low voltage (ULV) operational transconductance amplifier (OTA) topology inspired to the DIG-OTA. The proposed amplifier architecture leverages the principles of the conventional DIG-OTA while incorporating an inverter-based common-mode feedback (CMFB) loop and an inverter based output stage. Designed using TSMC’s 180 nm CMOS process, the proposed architecture achieves a gain of 49dB, a gain-bandwidth product of about 3.72 kHz, and a phase margin of 55 degrees, with an output load capacitance of only 5 pF that can be integrated on-chip. The CMFB mechanism implemented here ensures a commendable common-mode rejection ratio (CMRR), as high as 65 dB, which remains stable across process, supply voltage, and temperature (PVT) variations. Additionally, the power consumption of the proposed OTA is remarkably low at just 0.62 nW. All of these characteristics put the proposed OTA at state-of-the-art of ULV OTAs.
09:18	Cristina Adornes (Universidade Federal de Santa Catarina, Brazil) Gabriel Maranhão (Universidade Federal de Santa Catarina, Brazil) Deni Alves (Universidade Federal de Santa Catarina, Brazil) Cesar Rodrigues (Universidade Federal de Santa Catarina, Brazil) Márcio Schneider (Universidade Federal de Santa Catarina, Brazil) A CMOS instrumentation amplifier designed with open-source tools ABSTRACT. This paper presents a CMOS instrumentation amplifier based on a fully differential difference amplifier (FDDA) as part of a bioimpedance readout circuit for skin cancer detection. Developed using the open-source SkyWater 0.13 μm CMOS process from design to tape-out, the FDDA achieves a DC gain of 72 dB, a gain-bandwidth product of 47.8 MHz, an input-referred noise of 0.275 μV /√Hz at 1 kHz, and a CMRR of 119.9 dB. This work highlights the potential of open-source design flows in developing high-performance FDDA circuits, paving the way for more accessible development of advanced biomedical applications.
09:36	Zhiwei Ma (Universite Cote d'Azur, Polytech'Lab, UPR UniCA 7498, France) Gilles Jacquemod (University of Nice - Sophia Antipolis, France) Yoann Charlon (Universite Cote d'Azur, Polytech'Lab, UPR UniCA 7498, France) New multiplier and input layer in current mode for analog artificial neural networks ABSTRACT. Thanks to FDSOI (Fully Depleted Silicon On Insulator) technology, this paper presents a new implementation of a multiplier and the input layer of an Analog ANN (Artificial Neural Network). The architecture of the circuit is based on the MLP (Multi-Layer Perceptron) algorithm with back-propagation. The analog implementation of such an algorithm typically uses multipliers which are surface and power consuming. The second drawback of this topology concerns the storage of the weights. To overcome these problems, we propose to use new current mirrors functioning as multipliers and take advantages of the FDSOI technology. Using a similar approach, we have realized the input layer using current mirrors without digital-to-analog converters, reducing both silicon area and power consumption. This dual reduction will eventually enable us to implement a much larger number of neurons, thus increasing the complexity of the final artificial neural network.
09:54	Claudio Pedroso (Universidade Federal do Pampa, Brazil) Paulo Aguirre (Universidade Federal do Pampa, Brazil) Natalia Chagas (Universidade Federal do Pampa, Brazil) Alessandro Girardi (Universidade Federal do Pampa, Brazil) A 13.56-MHz CMOS Active-Rectifier WPT With Dynamically Controllable Comparator for IMDs PRESENTER: Claudio Pedroso ABSTRACT. This paper presents the design of a 13.56 MHz active full-wave integrated rectifier for wireless powered implantable medical devices. The four diodes of a conventional passive rectifier are replaced by two cross-coupled PMOS transistors and two comparator-controlled NMOS switches to reduce the voltage drops of the diodes, so that the voltage conversion ratio and power conversion efficiency are improved. The proposed design also focuses on reducing the reverse current in the switches. It was simulated in a standard 65-nm CMOS process with an ideal AC input of 1.2 V and presented a maximum power conversion efficiency of 84% and a maximum output power of 590 μW at a nominal output voltage of 1.1 V.
10:12	Luis Henrique Rodovalho (Synopsis, Portugal) Orazio Aiello (University of Genova, Italy) An Inverter-Based Difference Differential Amplifier with Active Frequency Compensation ABSTRACT. This work presents an improved circuit of the Dif- ference Differential Amplifier (DDA) based on Nauta’s inverter- based fully-differential amplifier. The proposed topology keeps the original Nauta DDA as the first stage and adds a second stage with feedforward common-mode cancellation and active frequency compensation. The circuit achieves 88 dB differential gain, 88 dB CMRR, 88 dB PSRR, 17.5 MHz GBW for a 15 pF load while consuming 1.17 mA for a 1.8 V supply voltage at room temperature.

09:00-10:30 Session 13D: IBERCHIP Poster 1

Chair:

Leonardo Soares (Instituto Federal do Rio Grande do Sul, Brazil)

Location: Carmenère

Heitor Huarachi (Universidade Federal do Pampa, Brazil)
Gabriel Cardoso (Universidade Federal do Pampa, Brazil)
Jiovana Gomes (Universidade Federal do Rio Grande do Sul, Brazil)
Sergio Bampi (Federal University of Rio Grande do Sul, Brazil)
Fabio Ramos (Universidade Federal do Pampa, Brazil)

Arquitetura para a Geração dos Elementos Sintáticos Residuais da Transformada do VVC

ABSTRACT. Nos últimos anos, a demanda por vídeo aumentou significativamente devido ao uso intensivo de plataformas de streaming e trabalho remoto. Para atender a essa demanda, são necessárias soluções mais eficientes. O Versatile Video Coding (VVC) é um padrão avançado de codificação de vídeo, projetado para oferecer alta qualidade de vídeo com excelente compressão. No entanto, essa eficiência vem acompanhada de uma maior complexidade no processo. Uma solução tradicional para lidar com esse aumento de complexidade é a utilização de aceleradores de hardware nas etapas mais criticas. Nos formatos modernos de codificação, a codificação residual gera a maior parte dos dados que entram na Codificação de Entropia. Acelerando esse processamento, é possível evitar gargalos e ociosidade no fluxo de codificação Este trabalho explora arquiteturas para a geração de ESRs (Elementos Sintáticos Residuais) nos modos baseados em transformada, denominada globalmente como TB-RSE-arch.

Fabricio Lorenzon (UFRGS, Brazil)
Mateus Grellert (UFRGS, Brazil)

Projeto e Síntese de Multiplicadores com Foco em Alto Desempenho

ABSTRACT. Circuitos de multiplicação são utilizados em diversas aplicações importantes como visão computacional e aprendizado de máquina. No entanto, estes circuitos normalmente são custosos em termos de área e energia quando comparados com circuitos aritméticos mais simples como somadores. Com o objetivo de avaliar o desempenho de multiplicadores com foco em alto de desempenho e baixo consumo de energia, este trabalho apresenta uma análise de multiplicadores do tipo array para diferentes larguras de bits. Além de uma versão combinacional para cada caso, duas versões com pipeline são propostas, a fim de se maximizar o desempenho dos circuitos para tamanhos maiores de dados. Resultados para a tecnologia standard cell XFAB 180nm de síntese apontam que o modelo com pipeline mais profundo é capaz de atingir um desempenho 5,78 superior quando comparado à versão combinacional, gerando ganhos de até 49% pela métrica energy-delay product.

Yasmin Souza Camargo (Federal University of Pelotas (UFPel), Brazil)
Matheus Isquierdo (Federal University of Pelotas (UFPel), Brazil)
Renira Soares (Federal University of Pelotas (UFPel), Brazil)
Daniel Palomino (Federal University of Pelotas (UFPel), Brazil)
Bruno Zatt (Federal University of Pelotas (UFPel), Brazil)
Felipe Sampaio (Federal Institute of Rio Grande do Sul (IFRS), Brazil)

Approximate Storage Evaluation at Intra-Frame Prediction in VVC Encoders

ABSTRACT. This paper explores the approximate storage to tolerate memory operation errors to improve energy consumption in intra-frame prediction for VVC encoders. We analyze the resilience levels in two memory regions: Original and Neighbor Samples Buffer (OrigSB and NeighSB). Further, multiple operation levels SRAM memory is adopted to evaluate the energy savings. The resilience profiling depicts the encoding efficiency drops for a wide-range of scenarios, considering different video sequences, error rates and VVC parameters. The results point to a substantial reduction in SRAM dynamic energy consumption (up to 58%) and promising error tolerance levels for OrigSB memory. Meanwhile, NeighSB exhibited lower resilience potential, with significant coding efficiency drops and subjective video visual quality deterioration, at the highest evaluated error rates.

Patrick Rosa (Universidade Federal de Pelotas, Brazil)
Daiane Freitas (Universidade Federal de Pelotas, Brazil)
Leonardo Müller (Universidade Federal de Pelotas, Brazil)
Guilherme Correa (Universidade Federal de Pelotas, Brazil)
Daniel Palomino (Universidade Federal de Pelotas, Brazil)

Avaliação de Ferramentas do Codificador AV1 para Interpolação de Pixels na Predição Inter-Quadros Fracionária

ABSTRACT. A reprodução de vídeos digitais é um processo computacionalmente custoso, pois requer um grande volume de dados. Portanto, para que a transmissão e/ou recepção desses meios seja viável, a compressão de vídeo é um fator fundamental. Os codificadores de vídeo incorporam uma série de ferramentas para tornar a compressão mais eficiente. Entre os codificadores de vídeo modernos, o Alliance for Open Media Video 1 (AV1) foi lançado em 2018, desenvolvido pelo consórcio AOMedia. Para alcançar esse desempenho e eficiência, ferramentas de codificação complexas foram adotadas no AV1. Este artigo apresenta uma série de avaliações sobre as ferramentas existentes no codificador de vídeo AV1, com foco no processo de interpolação. A ativação da ferramenta de interpolação com dual filter resulta em um ganho de 0,81% na eficiência de codificação para vídeos em UHD, mas esse ganho é considerado irrelevante em outras resoluções, o que pode não justificar seu uso. O filtro Regular é preferido, com 80,46% de utilização na vertical e 89,5% na horizontal. Em resoluções 4K, cerca de 21,42% do tempo de codificação é gasto na escolha de filtros de interpolação.

Leonardo Luís Müller (Universidade Federal de Pelotas, Brazil)
Daiane Freitas (Universidade Federal de Pelotas, Brazil)
Patrick Rosa (Universidade Federal de Pelotas, Brazil)
Guilherme Corrêa (Federal University of Pelotas, Brazil)

Escolha dos Filtros de Interpolação da Estimação de Movimento Fracionária do AV1 Usando Aprendizado de Máquina

ABSTRACT. O uso crescente de vídeos digitais tem se tornado cada vez mais presente em nossas vidas, abrangendo áreas como entretenimento, saúde, educação, entre outras. A reprodução de vídeos digitais é tanto computacional quanto energeticamente custosa, pois requer uma grande quantidade de dados, especialmente para conteúdos de alta qualidade. Portanto, para que a transmissão ou recepção dessas mídias seja viável, a compressão de vídeo é um fator fundamental. Codificadores de vídeo reúnem uma série de ferramentas que visam tornar a compressão mais eficiente e reduzir o tempo de processamento. Para alcançar esse desempenho e eficiência, ferramentas complexas de codificação foram adotadas no AV1, como o esquema de filtragem adaptativa aplicado aos filtros de interpolação, utilizado na etapa de predição inter-quadros. Este artigo apresenta uma solução baseada em aprendizado de máquina para acelerar o processo de interpolação de amostras fracionárias na etapa de Motion Estimation (ME). Os modelos preditivos apresentaram uma alta taxa de acerto. Para vídeos com resolução Full HD, os modelos provêm uma redução de 2,14% no tempo de codificação ao custo de um aumento de 0,124% na eficiência de compressão. Para vídeos com resolução HD, a redução no tempo foi de 1,84%, com uma perda de eficiência de compressão de 0,2195%. Assim, em ambos os casos, os modelos preditivos levam a uma diminuição no tempo de codificação com um pequeno impacto na eficiência de compressão.

10:30-11:00 Coffee break

Location: Carmenère

11:00-12:00 Session 14: Keynote - Manuel Delgado-Restituto "Integrated Microsystems for the Treatment of Neural Diseases"

Chair:

Guilherme Corrêa (Universidade Federal de Pelotas, Brazil)

Location: Malbec A+B

12:00-13:30 Lunch

13:30-15:00 Session 15A: Honey, I Shrunk the Circuits: Adventures in Nanotechnology

Chairs:

Marcel Walter (Technical University of Munich (TUM), Germany)
Yann Deval (Univ. Bordeaux, France)

Location: Malbec A+B

13:30	Simon Hofmann (Technical University of Munich, Germany) Marcel Walter (Technical University of Munich, Germany) Robert Wille (Technical University of Munich, Germany) Physical Design for Field-coupled Nanocomputing with Discretionary Cost Objectives ABSTRACT. Field-coupled Nanocomputing (FCN) represents a class of emerging post-CMOS technologies that achieve nanoscale computation without relying on the flow of electrical current. Despite their potential, existing physical design algorithms for FCN predominantly focus on minimizing either layout area or execution runtime, neglecting the complexity of real-world design constraints. In this work, we introduce the first physical design method for FCN that accommodates discretionary cost objectives, marking a significant advancement in the field. This approach integrates insights from both simulation and manufacturing, facilitating more comprehensive and optimized design solutions. We offer an open-source implementation and validate the proposed algorithm experimentally on a set of common benchmark functions, demonstrating its effectiveness across a range of different scenarios and cost objectives.
13:48	João V. C. Teixeira (Departamento de Ciência da Computação Universidade Federal de Minas Gerais (UFMG), Brazil) Poliana A. C. Oliveira (Centro Federal de Educação Tecnológica de Minas Gerais (CEFET-MG), Brazil) Renan A. Marks (Faculdade de Computação Universidade Federal de Mato Grosso do Sul (UFMS), Brazil) Omar P. V. Neto (Departamento de Ciência da Computação Universidade Federal de Minas Gerais (UFMG), Brazil) Enhancing DNA Analog Circuits Design Through Delayed Species Insertion ABSTRACT. Molecular computing, particularly DNA-based systems, offers immense potential for the creation of programmable biological devices. However, a critical challenge in advancing DNA computing is the lack of precise control over the time and sequence of chemical reactions within these circuits. The stochastic nature of molecular interactions, combined with the inherently parallel nature of reactions, makes synchronizing and ordering them difficult. Variations in reaction rates and the presence of noise, such as unintended leak reactions, further complicate this control, limiting the scalability and complexity of DNA-based circuits. In this paper, we explore the limitations of analog DNA circuits, focusing on the need for better mechanisms to regulate the timing and sequence of reactions. We argue that improving this control could address many existing problems, enabling the development of more complex and reliable molecular circuits. These advancements are essential for moving DNA-based computation closer to practical, real-world applications.
14:06	Gabriel Novy (Universidade Federal de Minas Gerais, Brazil) Julio Teodoro (Universidade Federal de Minas Gerais, Brazil) Jhonattan Ramírez (Universidade Federal de Minas Gerais, Brazil) Omar Neto (Universidade Federal de Minas Gerais, Brazil) Feasible all-optical OR and NOR logic gates in photonic crystals ABSTRACT. This work presents the design and simulation of photonic crystal-based OR and NOR logic gates, aimed at eliminating the need for dynamic control signals dependent on input combinations. Utilizing silicon-based photonic crystals operating at a 1550 nm wavelength, the gates exhibit enhanced performance, making them suitable for optical computing. Simulation results confirm the reliability of the proposed designs, with the OR and NOR gates achieving contrast ratios of 5.3 dB and 5.1 dB, respectively. Innovative waveguide junctions play a crucial role in minimizing signal loss and preserving signal integrity. This work advances photonic computing by offering a simplified control mechanism while maintaining high performance, paving the way for more energy-efficient and faster alternatives to traditional electronic logic gates. Future research directions include experimental validation and integration into more complex photonic circuits.
14:24	Ruan Formigoni (Universidade Federal de Vicosa, Brazil) Ricardo Ferreira (Universidade Federal de Vicosa, Brazil) Omar Neto (Universidade Federal de Minas Gerais, Brazil) José Augusto Nacif (Universidade Federal de Vicosa, Brazil) Network Collapsing Placement and Routing for Field-Coupled Nanocomputing ABSTRACT. The complementary metal-oxide semiconductor (CMOS) is the industry standard for chip fabrication. In recent decades, its miniaturization processes have become increasingly complex and expensive, with atomic limitations and ever-growing static power dissipation. Field-coupled nanocomputing has emerged to address these issues with technologies that use elements that are alternative to the traditional transistor and require no static power dissipation. In this field, the well-known NP-hard placement and routing problem in CMOS re-emerges, now with novel constraints. Our work provides a scalable solution for this NP-Hard problem, improving the area overhead compared to current state-of-the-art techniques, with a minor trade-off in time complexity. We achieve up to 23.15x area reduction with an average of 5.13x and runtime of only 8 milliseconds.
14:42	Emanuel Ruella (Universidade Federal de Viçosa, Brazil) Ricardo Ferreira (Universidade Federal de Viçosa, Brazil) Omar Neto (Universidade Federal de Minas Gerais, Brazil) José Nacif (Universidade Federal de Viçosa, Brazil) Development of More Robust and Stable Logic Gates Using Novel Parameter Values for SiDB ABSTRACT. As CMOS technology approaches its physical limits, there is a growing need to explore alternatives beyond CMOS, such as Silicon Dangling Bonds (SiDBs). SiDBs, utilizing Coulombic interactions, offer the potential for ultra-low energy consumption and high integration density. Recent research has introduced new parameter values for SiDB technology, specifically the Thomas-Fermi Screening Length of 1.8 and relative permittivity of 4.10 This study presents a comprehensive library of logic gates based on these novel values, demonstrating significant stability and interference reduction improvements. Our contributions include: (1) A library of standard Boolean function gates, (2) Additional gates adapted from existing designs, and (3) A detailed comparison of gate performance using new versus old parameter values. The findings indicate that these new parameters enable more robust and compact circuit designs, advancing the potential of SiDB technology.

13:30-15:00 Session 15B: Emerging Techniques in FPGA and Reconfigurable Computing

Chair:

Antonio Carlos Beck (Universidade Federal do Rio Grande do Sul (UFRGS), Brazil)

Location: Assemblage

13:30	Icaro G. S. Moreira (universidade federal de viçosa, Brazil) Lucas Bragança (universidade federal de viçosa, Brazil) Olavo Silva (universidade federal de viçosa, Brazil) Alysson Silva (universidade federal de viçosa, Brazil) Ricardo S. Ferreira (universidade federal de viçosa, Brazil) José A. M. Nacif (universidade federal de viçosa, Brazil) Bridging the Gap: Accelerating Random Forests on FPGAs with High-Bandwidth Memory ABSTRACT. In memory-bound problems, Field Programmable Gate Arrays (FPGAs) have traditionally underperformed com- pared to Graphics Processing Units (GPUs) due to their lower memory bandwidth. However, the advent of High-Bandwidth Memory (HBM) in FPGAs has significantly enhanced their performance, achieving bandwidths up to 425 GB/s. Additionally, FPGAs offer the advantage of customizable accelerators for domain-specific tasks, potentially outperforming general-purpose GPU architectures. This work focuses on accelerating random forest algorithms on FPGAs, leveraging their customization capabilities to efficiently manage control flow structures such as decision branches. Despite these advancements, FPGAs remain challenging to program, requiring a deep understanding of hardware design. To address this, we propose a new hardware generator that integrates necessary tools into a cohesive workflow, simplifying FPGA development. Validated on a Xilinx Alveo FPGA, the design utilizes 32 HBM channels, and reaches a performance of 8 billions samples per second, offering a practical solution for memory-bound machine learning tasks in high- performance computing environments.
13:48	Vilmondes R. Silva (Federal University of Minas Gerais (UFMG), Brazil) Dalton M. Colombo (Federal University of Minas Gerais (UFMG), Brazil) Tomás P. Corrêa (Federal University of Minas Gerais (UFMG), Brazil) Low-cost FPGA-based Digital-to-Time Converter ABSTRACT. In this paper, we present a Digital-to-Time Converter (DTC) based on a low-cost FPGA platform, utilizing the Vernier with oscillators architecture. A DTC is a circuit that converts digital information into a very accurate time output. The proposed Vernier DTC is implemented on an Altera Cyclone III FPGA chip and features tunable resolution through the periodic relationships between two PLL-generated signals. The linearity of the system was measured for a resolution of 990 ps, showing DNL and INL values of -0.41 to +0.54 and -0.82 to +0.17, respectively. It has a range of 99 ns and an estimated power consumption of 89 mW. Furthermore, measurements demonstrate that the system achieves a maximum resolution of 12.5 ps, utilizing only 2% of the FPGA resources. Additionally, the logical and physical synthesis of the proposed design was carried out using a commercial 350 nm CMOS technology, and the estimated power consumption and silicon area are 1.75 mW, and 237 µm x 190 µm, respectively.
14:06	Ian Kersz Amaral (UFRGS, Brazil) Michael Jordan (UFRGS, Brazil) Hiago Mayk G. de A. Rocha (UFBA, Brazil) Felipe Kalinski Ferreira (HT Micron, Brazil) Jose Rodrigo Azambuja (Federal University of Rio Grande do Sul, Brazil) Fernanda Kastensmidt (UFRGS, Brazil) Antonio Carlos Schneider Beck (Universidade Federal do Rio Grande do Sul, Brazil) Exploiting Design Flexibility in Multi-Tenant Multi-FPGA Edge Systems ABSTRACT. Multi-FPGA architectures are increasingly used in edge environments for their reconfigurable nature, enabling high performance and energy efficiency by tailoring designs to specific workloads. However, in multi-tenant edge with diverse and dynamic workloads, navigating design heterogeneity while managing resources constraints is challenging. Efficient provisioning requires selecting the most suitable set of designs to accommodate various task requests with different behaviors, balancing design variety with limited resources. In this paper, we propose a flexible framework that bridges design heterogeneity and efficient resource management in multi-tenant multi-FPGA edge. The framework leverages a comprehensive design pool and dynamic strategies for design selection and task distribution. Our results point out 4.7x and 3x improvements in makespan and energy efficiency over traditional non-adaptive methods.
14:24	Emanuel Trabes (Service d’electronique et de Microelectronique, University of Mons, Mons-Belgium, Belgium) Aymen Zayed (Service d’electronique et de Microelectronique, University of Mons, Mons-Belgium, Belgium) Carlos Valderrama (Service d’electronique et de Microelectronique, University of Mons, Mons-Belgium, Belgium) Jimmy Tarrillo (Universidad de Ingenieria y Tecnologia, Peru) Design Exploration of DWT-Based Feature Extraction Using FPGA for High-Performance Signal Processing ABSTRACT. The discrete wavelet transform (DWT) is commonly used for feature extraction in machine learning applications. Since these applications are frequently deployed in portable systems with limited computational resources, FPGA-based hybrid hardware/software solutions might be a viable choice. This article provides an analysis of various 4-level db4 DWT and feature extraction techniques implemented on the Zynq 7020 device. Alternative DWT versions include fixed-point and floating-point implementations, cascade and single-core reuse architectures, as well as designs in HDL and VHDL. The feature extraction process considers mean, energy, and entropy. It has also been implemented in an architecture that efficiently reuses these computational cores. These versions are compared in terms of accuracy, resources used, performance and, poewr consumption.
14:42	Everton Alceu Carara (UFSM, Brazil) Leonardo Londero de Oliveira (UFSM, Brazil) Bruno Henrique Spies (UFSM, Brazil) Mathias Cirolini Michelotti (UFSM, Brazil) Vinicius Gabriel Schultz (UFSM, Brazil) Evaluating Multiplier-Less CNNs in RISC-V Architecture PRESENTER: Bruno Henrique Spies ABSTRACT. In recent years Convolutional Neural Network (CNN) emerged as Machine Learning (ML) became a popular approach to solve problems in distributed area computations such as mobile devices and Internet of Things (IoT). It is well known that local computation at edge devices is preferable over transmitting a huge amount of data to run ML algorithms at a central node. In this sense, RISC-V has the research community’s attention as a flexible architecture and royalty-free alternative for embedded processors and IoT devices. Although the latest research on RISC-V and CNNs has been instruction set architecture (ISA) customization to speed up the convolution process, this work investigates the impact on inference execution time when replacing multiplication instructions by shift in multiply and accumulate (MAC) operations. Compared to slow multi-cycle multiplication instructions, our experiments showed inference throughput speedup ranging from 1.45x to 1.95x with negligible impact on memory footprint and employing only the base integer RISC-V ISA (RV32I).

13:30-15:00 Session 15C: Circuits, Systems and Parallel Processing of Visual Signal

Chairs:

Fabio Ramos (Universidade Federal do Pampa, Brazil)
Luciano Agostini (UFPel - Universidade Federal de Pelotas, Brazil)

Location: Malbec C

13:30	Vinicius Borges (UFPel, Brazil) Murilo Perleberg (UFPel, Brazil) Marcelo Porto (UFPel, Brazil) Luciano Agostini (UFPel, Brazil) Hardware Design for VVC Angular Intra Prediction Modes with Coding Efficiency Awareness ABSTRACT. The Versatile Video Coding (VVC) is the current state-of-art video coding standard, and it was developed to provide very high coding efficiency for different types of visual information. As a drawback, VVC demands much higher computational cost when compared with previous standards, which can affect the current trends as dedicated hardware for mobile devices and real-time applications. This work is focused on the angular modes tool of the VVC intra-frame prediction and presents a heuristic to reduce its computational cost together with its high throughput hardware design. The proposed heuristic result shows a decrease of 18.72% in the computational cost of the entire VVC encoder, with 2.17% of loss in coding efficiency. The designed hardware used an area of 4,838.7 k NAND2 gates and it is capable of encoding HD 1080p@30fps videos running at 124.3 MHz and with a power dissipation of 270.5 mW.
13:48	Vinícius Faria (Unipampa, Brazil) Jiovana Gomes (UFRGS, Brazil) Sergio Bampi (UFRGS, Brazil) Fabio Ramos (Unipampa, Brazil) High-Performance Binary Arithmetic Encoder with Multiple Bypass Bin Scheme for VVC CABAC ABSTRACT. Video is an essential part of the human experience these days, with a broad range of applications, from entertainment to remote work applications. With this in mind, new techniques are mandatory to comply with this data type's ever-increasing demand and quality requirements. Versatile Video Coding (VVC) is the newest member of a family of video coding standards, begotten to tackle the challenges of the new video processing landscape. VVC follows the hybrid codec paradigm, which is composed of predictions, transforms, and entropy coding. The Context Adaptive Binary Arithmetic Encoder (CABAC) is the chosen algorithm for the entropy stage, but it has some differences compared with past versions used in predecessor video codecs. Thus, a hardware circuit for VVC CABAC is a desirable solution for coping with real-time processing and energy-efficient scenarios. More significantly, the bottleneck step is the Binary Arithmetic Encoder (BAE), which is the focus of this work, and where a design named VArchBAE is introduced. A Multiple Bypass Bin Scheme (MBBS) is also integrated into the architecture to improve the throughput. To the best of the authors' knowledge, this is the first BAE architectural solution found in the literature for the VVC standard.
14:06	Laiane Souza (Federal University of Pelotas (UFPel), Brazil) Yasmin Souza Camargo (UFPEL, Brazil) Bruno Zatt (Federal University of Pelotas (UFPel), Brazil) Sergio Bampi (Federal University of Rio Grande do Sul (UFRGS), Brazil) Felipe Sampaio (Federal Institute of Rio Grande do Sul (IFRS), Brazil) Video Decoder Optimization for Speculative Motion Compensation for Near-Data Processing Exploitation ABSTRACT. This paper presents a video decoder optimization strategy for improving the performance of speculative implementations of Motion Compensation (MC) at near-data processing (NDP) platforms. To fully exploit the 3D-DRAM memory access parallelism provided by NDP-based processing elements, the correlations between motion fields of neighboring frame regions should be exploited in speculative approaches. As our first contribution, a detailed analysis is presented to understand the behavior of fractional motion vectors to be decoded by the MC. The statistical distributions of fractional MV positions are evaluated, providing key insights to be considered by speculative approaches for MC decoding. Then, a video decoder optimization is introduced to adjust the fractional MV coordinates for each predefined interpolation window (2Kx128) according to the most frequent fractional position. As a result, the decoded video quality losses are evaluated, providing negligible PSNR drops for video decoder experiments with higher QP values. Still, dynamic approaches should be addressed to adapt the optimization strengths to minimize quality drops while keeping the performance of NDP-based speculative MC decoding.
14:24	Vitória Fabricio (Video Technology Research Group, Federal University of Pelotas, Brazil) Iago Storch (Video Technology Research Group, Federal University of Pelotas, Brazil) Daniel Palomino (Video Technology Research Group, Federal University of Pelotas, Brazil) Processing Time Evaluation of the Classification Step in the Adaptive Loop Filter of VVC under Multiple Programming Paradigms ABSTRACT. Several tools were introduced by the Versatile Video Coding (VVC) standard to enhance compression, with the Adaptive Loop Filter (ALF) being one such tool that significantly enhances visual quality. Although it provides coding efficiency gains, the ALF also poses a substantial computational burden. To address this issue, this paper evaluates the processing time of the classification step in the ALF process of VVC encoders considering different programming paradigms. A sequential CPU implementation, a Single Instruction Multiple Data implementation, and a customized parallel implementation using CUDA to be executed in GPUs. The results showed that SIMD-optimized implementation significantly outperforms the fully-scalar implementation. Although the GPU paradigm is faster than fully-scalar, it remains slower than SIMD-optimized due to CPU-GPU communication overhead. With more tasks, the GPU could potentially surpass the SIMD-optimized processing time.
14:42	Ismael Seidel (Federal University of Santa Catarina (UFSC), Brazil) André Filipe da Silva Fernandes (Federal University of Santa Catarina (UFSC), Brazil) Jose Luis Güntzel (Federal University of Santa Catarina (UFSC), Brazil) A Parallel JPEG Pleno Baseline Block-Based Profile Light Field Encoder using OpenMP ABSTRACT. Light Fields (LFs) are a plenoptic image modality that provides more information on light rays, making them an excellent representation for immersive media. To compress such a modality, the Joint Photographic Experts Group (JPEG) committee created the JPEG Pleno Part 2 standard with two profiles. This work focuses on the reference encoder implementation for the Baseline Block-Based Profile (BBBP), called JPEG Pleno Model (JPLM). Our main contribution lies in the proposal and analysis of a parallel implementation of JPLM using OpenMP. We show that it is possible to accelerate encoding from nearly 2 to 10 times when using 2 to 16 threads, with a memory overhead ranging from 15% up to 78%, depending on the LF size. Moreover, the speedup comes with no cost in terms of coding efficiency, i.e., the LFs encoded with the proposed parallel version are bit-exact matches to ones encoded with the sequential version.

13:30-15:00 Session 15D: IBERCHIP Poster 2

Chair:

Agustin Galetto (Universidad Tecnológica Nacional, Argentina)

Location: Carmenère

Dalton Colombo (UFMG, Brazil)
Daniel Carvalho Lott (UFMG, Brazil)
Pedro Sartori Locatelli (Universidade de Dalhousie, Canada)
Kamal El-Sankary (Universidade de Dalhousie, Canada)
Greg Burley (National Research Council Canada, Canada)

Driver de corrente CMOS de ± 0-10 mA

ABSTRACT. Este trabalho propõe uma topologia de um Driver de Corrente de ± 10 mA. O circuito foi implementado usando uma tecnologia CMOS comercial de 350 nm e uma tensão de alimentação simétrica de ± 3,3 V. O controle do valor da corrente de saída é realizado por meio de uma palavra digital de 11 bits. No chip de teste, quatro drivers idênticos foram projetados para fornecimento simultâneo de corrente de saída

Alvaro Costa Junior (Universidade de São Paulo, Brazil)
Raphael Lopes Pinheiro (Universidade de São Paulo, Brazil)
Maximiliam Luppe (Universidade de São Paulo, Brazil)

Implementation and Analysis of a TRNG Based on RO-PUF

ABSTRACT. This article presents a True Random Number Generator (TRNG) based on a Ring Oscillator (RO-PUF) circuit, which leverages the unique characteristics of signal propagation time variations in digital circuits. With the growing concern surrounding security in electronic devices, the demand for robust and reliable sources of randomness becomes increasingly evident, particularly in critical applications such as cryptography, authentication, and data protection. The proposed TRNG aims to capture entropy from the Ring Oscillators to facilitate the generation of random numbers. The methodology encompasses the implementation of the system on an FPGA using the Quartus Prime tool, utilizing Verilog for the design. This is followed by a series of statistical tests based on NIST guidelines, implemented in Python, to validate the randomness of the generated numbers.

Cristhofer Cunha Marques (Federal University of Pampa - UNIPAMPA, Brazil)
Victor Matheus Lima (Federal University of Pampa - UNIPAMPA, Brazil)
Crístian Müller (Federal University of Pampa - UNIPAMPA, Brazil)
Alessandro Gonçalves Girardi (Federal University of Pampa - UNIPAMPA, Brazil)
Paulo César Comassetto de Aguirre (Federal University of Pampa - UNIPAMPA, Brazil)

High Level Design of CIFB and CIFF Incremental Σ∆ ADC Architectures for Biomedical Signals

ABSTRACT. In the last decade, there has been an increasing interest in biomedical electronic devices. It has driven the development of smart wearable devices for real-time monitoring of individuals’ health, as such it is accomplished by smartwatches. These accessories are characterized by the presence of sensors capable of capturing the environmental context, have expanded memory, processors for multitasking and wireless protocols for autonomous operation. These devices collect precise physiolog- ical data in real time through non-invasive processes such as electrocardiogram (ECG) and photoplethysmography (PPG). The capture of physiological signals by wearable devices requires analog-to-digital converters (ADCs) capable of converting low- frequency signals into medium- and high-precision data, with low power consumption. In this article, the operating mode of incremental sigma-delta ADCs are reviewed, in addition to comparing the structures of the CIFB and CIFF architectures through the implementation of a fourth-order incremental sigma- delta (IΣ∆) ADC for a 250-Hz signal bandwidth. Furthermore, the two modulator architectures achieved a very similar effective bit rate and SNR based on the variation of the cycle numbers. Another factor that corroborates these results is the fact that the simulations were performed in a high-level, Matlab/Simulink, as the components are treated in an idealized way. The integrators, filters and quantizers work without losses or errors related to the real circuit, such as thermal noise, offset errors, or temperature variations.

Bernardo Correa (Federal University of Paraná, Brazil)
Bernardo Leite (Federal University of Paraná, Brazil)

Biasing analysis of an RF CMOS cascode power amplifier

ABSTRACT. This paper presents an analysis of a cascode radiofrequency power amplifier (PA) designed using 130 nm CMOS technology, focusing on balancing key performance metrics such as linearity, efficiency, and gain. Investigating the behavior of the PA across a range of bias voltages, with the common gate bias (Vgbias) varying from 0.6 V to 2.9 V and common source bias (Vsbias) assuming 0.4 V, 0.6 V and 1.1 V, it is possible to tailor the behavior to better suit mobile transmitter applications. Three simulations were performed using harmonic balance analysis, loadpull to find the load impedance on the best OCP1dB and its value. Then a compression point analysis to acquire both the gain and power added efficiency (PAE) at OCP1dB. Finally, a sweep on the input power to verify the saturated output power and maximum PAE values. Results reveal distinct combinations of balance between linearity, gain and efficiency. The best performance results include a gain of 20.8 dB, a peak PAE of 40.1 % and a 15 dBm compression point at 1dB produced by the cascode on a configuration of Vsbias = 0.6 V and Vgbias = 2.7 V.

Bruna Henning Pereira (Univali, Brazil)
Cesar Albenes Zeferino (Univali, Brazil)

Developing a Wearable Device Solution for Seizure Prediction of Patients with Epilepsy

ABSTRACT. Epilepsy affects approximately 50 million people worldwide, 2\% of whom live in Brazil. It significantly impacts the quality of life of these patients, putting their physical integrity and lives at risk. In addition, epilepsy can also affect the mental health of its sufferers, such as problems due to anxiety and depression due to, for example, the unpredictability of seizures. In this context, this work proposes a wearable device in the form of a bracelet capable of alerting the patient of an imminent seizure. The device will use an embedded system responsible for capturing, filtering and classifying heart rate variability signals to detect and alert its user. To do this, it will employ time domain and frequency domain analysis and spectral density analysis to obtain data on heart rate changes. It will then use a support vector machine to classify these signals and support decision-making regarding the issuance of the alert.

15:00-15:30 Coffee break

Location: Carmenère

15:30-16:30 Session 16: Keynote - SSCS - Sergio Bampi "Approximate Computing: Techniques and Framework for Energy-Efficient CMOS Accelerators"

Chair:

Yann Deval (Univ. Bordeaux, France)

Location: Malbec A+B

16:30-17:00 Session 17: Closing Session

Chairs:

Alexandra Zimpeck (Cadence, Brazil)
Fabián Olivera (CEFET-RJ, Brazil)

Location: Malbec A+B

17:30-20:30 Farewell party