High-Speed and Low-Energy Dual-Mode Logic based Single-Clock-Cycle Binary Comparator
ABSTRACT. This paper presents an energy-efficient single-clock-cycle binary Dual-Mode Logic (DML)-based comparator optimized to operate in the dynamic mode. The parallel-prefix architecture is implemented to ensure high speed, whereas low power consumption is guaranteed by reducing the switching activities of internal nodes. Domino Logic (DL) and DML implementations are compared in terms of delay and energy for different supply voltages in the 32 nm technology. We demonstrate an average improvement of 5% in both energy and delay when the DML design is operating in the dynamic mode compared to its conventional domino counterpart. Moreover, the DML design operating in the static mode allows to save up to 43% energy consumption compared to the equivalent domino logic-based implementation.
EDP Optimization of Parallel Applications via CPU Frequency Scaling on AMD Processors
ABSTRACT. Dynamic Voltage and Frequency Scaling (DVFS) has been widely used to improve the use of computational resources when a system is executing parallel applications. On top of that, as parallel applications have different behavior (e.g., CPU usage and shared memory accesses), DVFS methods must be able to deal with the application characteristics at hand. However, as we show in this paper, DVFS governors already available in many Linux Operating System distributions are not capable of dealing with such a scenario, providing a trade-off between performance and energy consumption (energy-delay product - EDP) that is distant from the best possible.
Given that, we propose a run-time and dynamic approach to optimize the EDP of parallel applications running on AMD processors that automatically selects the ideal CPU frequency and Boosting operating mode according to the characteristics of the application at hand. When executing sixteen well-known parallel applications on two multicore architectures, we show that our approach provides EDP optimizations of up 38% when compared to the ondemand DVFS governor.
Electrical Evaluation of Logic Network Generation Methods for Supergates using SwitchCraft
ABSTRACT. Recent developments in electronic design automation tools vastly reduce the design cost of static CMOS complex gates (SCCG), enabling an alternative approach to the logic synthesis. Despite many design strategies targeting the transistors network in SCCGs, their comparisons are often limited to metrics such as the number of transistors used or circuit total stack, lacking an in-depth electrical evaluation. This work presents an electrical comparison of three different design techniques. The study evaluates the 3982 logical functions of the 4 input P-class, and it shows that topologies that optimize both pull-up and pulldown networks individually presented better overall electrical characteristics. The results also suggest that reducing the logic gate stack or the number of transistors does not necessarily lead to better performance, showing that focusing only on optimizing for these parameters does not reflect in electrical improvement.
Exploring Approximate Adders for Power-Efficient Harmonics Elimination Hardware Architectures
ABSTRACT. This paper explores approximate adders (AA), in an harmonic elimination system using Least Mean Square (LMS) filters. The AA Lower-Part-Or Adder (LOA), Error Tolerant Adder (ETA-I), Truncation adder (Trunc) and Copy adder are used in all the harmonic elimination system. Since the filtering systems is a 16-bit circuit the approximate part of the adders varies the approximation level parameter (K) from 1 to 8. The Root-Mean-Square Error (RMSE) and Mean Absolute Error (MAE) metrics show that the filtering is efficient for all AA circuits with K=4. However, the LOA AA remains efficient for filtering the signal until K=5, but with higher power dissipation. Therefore, the results point to the "Copy_b" (K=4) as the most efficient AA to be applied in the harmonic elimination system with 7.5% less area and 21.8% less power than the one with precise adders.
A Power-Efficient FFT Hardware Architecture Exploiting Approximate Adders
ABSTRACT. This work presents an energy-efficient Fast Fourier Transform (FFT) hardware architecture exploiting approximate adder circuits. The FFT hardware architecture consists of a fixed-point fully sequential architecture with a radix-2 butterfly with decimation in time (DIT). In this paper, we explore a set of approximate adders (LOA, ETA-I, Copy-A, Copy-B, Trunc0, Trunc1) in the butterfly by varying the approximation level (K term). The Root-Mean-Square Error (RMSE) metric shows which approximate level term allows the FFT processing without widely signal losses. The results show that our best-proposed FFT employing Trunc0 approximate adder with K=10 saves up to 35% of power dissipation compared to the FFT with the original radix-2 butterfly using the synthesis tool operators.
A Library of High-Level Models for the Simulation of DC-DC Converters
ABSTRACT. Custom integrated DC-DC converters are frequently used to power system-on-chip architectures for achieving high-quality voltage regulation at the best possible efficiency. However, high-level models of such converters are usually incompatible with analog simulators and require the designer to re-assemble the circuit in the final design environment to verify the design with real transistor models. This process is both time consuming and error-prone. This paper presents a library of parametrizable, high-level macro-models for DC-DC converters, consisting of custom Verilog-A modules and cells from the standard libraries. The macro-models are compatible with analog simulators and offer a significant amount of flexibility to the designer.
RF-DC Multiplier for RF Energy Harvester based in 32nm and TFET technologies
ABSTRACT. In this work, we are studying the effect of the technology scaling for different full-wave rectifier topologies using the Cross-Coupled Differential Drive (CCDD) strategy. For a conventional CCDD scaling from 90nm to 32nm, the PCE and VCE are maintained the same while a large degradation of the dynamic range and sensitivity are observed. This effect could be slightly limited by using a self-body bias CCDD topology. However, the use of TFET enables to avoid this degradation and provide a large VCE and output voltage for input voltage lower than 300mV. To extend this VCE for input voltage > 300mV, we use a CCDD topology increasing the loading drive capability. Interestingly, this resulted not only on increasing the output voltage for large Vin but also demonstrated large PCE than expected for this topology.
ABSTRACT. This paper presents experimental results showing that the energy contained in beta radiation can be harvested by using diodes. A single BPW34 photodiode generates around
12 pA dc, so enough to power integrated analog blocks.
A Three-Stage Charge Pump with Forward Body Biasing in 28 nm UTBB FD-SOI CMOS
ABSTRACT. Energy harvesting techniques provide solutions for powering battery-free circuits or even for charging storage elements such as batteries or super capacitors. In this paper, a three-stage charge pump that is appropriate for thermoelectric and photo-voltaic energy harvesting is carried out in a 28 nm ultra-thin buried oxide (UTBB) fully-depleted silicon-on-insulator (FD-SOI) CMOS technology. Taking advantage of the FDSOI substrate characteristics, the forward-body-biasing (FBB) technique is used in order to improve the switch conductances. Extensive simulation results validate the proper operation of the harvesting system at minimum input voltage of 200 mV and show a maximum efficiency peak of 56% at input voltage of 300 mV and load current of 100 nA.
High-speed Hardware Accelerator for Trace Decoding in Real-Time Program Monitoring
ABSTRACT. Multicore processors are currently the focus of new and future critical-system architectures.
However, they introduce new problems in regards to safety and security requirements.
Real-time control flow monitoring techniques were proposed as solutions to detect the most common types of program errors and security attacks.
We propose a new way to use the latest debug and trace architectures to achieve full and isolated real-time control flow monitoring.
We present an online trace decoder FPGA component as a solution in the search for scalable and portable monitoring architectures.
Our FPGA accelerator achieves real-time CPU monitoring with only 8% of used resources in a Zynq-7000 FPGA.
Reliability Analysis in Less than 200 Lines of Code
ABSTRACT. Answer Set Programming (ASP) is proposed as a compact and versatile approach to circuit analysis. By the example of upsets in registers we demonstrate how to perform reliability analysis in less than 200 lines of code. By an efficient problem encoding we achieve an input data format similar to a Verilog netlist so that extensive preprocessing is avoided. No development of algorithms is required as the analysis relies on elaborate and highly optimized ASP solvers. Exemplary results for a wide range of circuits are presented and potential optimizations are pointed out.
A CMOS Implementation of the Tent Map for Random Number Generation
ABSTRACT. A new tent map based random number generator (RNG) is designed in TSMC 65 nm CMOS technology. Simulation results verify that the generated random sequences successfully pass the randomness tests in the FIPS-140-2 and NIST 800-22 test suites. Superior to other studies in the literature, our RNG satisfies the randomness tests without post processing. Moreover, the bit generation rate can be increased in exchange of more power consumption. Thus, with the architecture used in this work, robust RNGs needed for security applications can be implemented with higher data rates.
ABSTRACT. Most algorithms used in VLSI CAD tackle NP-hard problems, and face scalability issues arisen from the ever-increasing circuit size. Partitioning enables the use of divide-to-conquer strategy, allowing the usage of complex algorithms by reducing the problem in instances of smaller sizes.Several open-access tools have arisen to tackle partitioning problem.This work performs a comprehensive evaluation of four partitioning tools, investigating the performance in actual benchmarks. We investigate graph and hypergraph partitioning, considering different graph models of hypergraphs and edge-weighting schemes. The analysis compares graph and hypergraph partitionings in terms of hyperedge cut, number of terminals, and runtime. Finally, the robustness of the tools is evaluated by testing different numbers of final partitions and how they are balanced.The results present difference up 2X in hyperedge cut and more than 1000X in runtime. The presented data can define the applicability and advantages of each tool and possible methods to alleviate those downfalls.
ABSTRACT. One challenge imposed by ubiquitous computing of embedded systems is the need for power and energy-efficient implementations, particularly because many of them are operated with batteries. In this sense, tailored application-specific processors can meet the resource requirements of a specific application in the most efficient way. In this paper, we present TailoredCore, a design methodology to generate application-specific processors based on a core architecture implementation. This methodology analyzes the application to be executed and produces a customized RISC-V core with the resources required, while reducing the hardware overhead due to, for instance, instructions and registers not needed. Using TailoredCore, we achieve up to 38% savings in registers and 12% in logic elements when generating cores for five CHStone benchmark applications and implementing them on an FPGA. These savings in the area also correspond to a reduction of the required power and energy.
Analog and RF Circuit Constrained Optimization Using Multi-Objective Evolutionary Algorithms
ABSTRACT. This paper presents a simulation-based optimization
method for automatic sizing in analog and RF IC blocks. It
introduces a combination of a state-of-the-art Multi Objective
evolutionary algorithm (EA) with a new constraint handling
approach to effectively explore the high-dimensional constrained
design space, typical in every analog and RF IC block design. An
additional modification in the core of the EA is also proposed for
handling efficiently mixed continuous-integer parameter search
spaces. The methodology is illustrated in a Nested-Current-
Mirror amplifier and a Wideband Low Noise Amplifier achieving
better results than typical constraint handling approaches.