NANOARCH2018: INTERNATIONAL SYMPOSIUM ON NANOSCALE ARCHITECTURES 2018
PROGRAM FOR THURSDAY, JULY 19TH
Days:
previous day
all days

View: session overviewtalk overview

10:25-11:40 Session 8: Crossbar-based Architectures
Chair:
Location: Templar's
10:25
Hardware Acceleration Implementation of Sparse Coding Algorithm with Spintronic Devices
SPEAKER: Deming Zhang

ABSTRACT. In this paper, we explore the possibility of hardware acceleration implementation of sparse coding algorithm with spintronic devices by a series of design optimizations across the architecture, circuit and device. Firstly, a domain wall motion (DWM) based compound spintronic device (CSD) is engineered and modelled, which is envisioned to achieve multiple conductance states. Sequentially, a parallel architecture is presented based on a dense cross-point array of the proposed DWM based CSD, where each dictionary (D) value can be mapped into the conductance of the proposed DWM based CSD at the corresponding cross-point. Benefitting from its massively parallel read and write operation, such proposed parallel architecture can accelerate the selected sparse coding algorithm using a designed dedicated periphery read and write circuit. Experimental results show that the selected sparse coding algorithm can be accelerated by 1400* with the proposed parallel architecture in comparison with software implementation. Moreover, its energy dissipation is 8 orders of magnitude smaller than that with software implementation.

10:45
Quantum-dot Cellular Automata RAM design using Crossbar Architecture

ABSTRACT. In this paper, a new approach of RAM circuits, using Quantum-dot Cellular Automata (QCA), based on programmable crossbar architecture, is presented. In addition, a methodology for 2^n bits RAMs is presented. Using the aforementioned methodology any designer can design a RAM regardless of its size. The proposed designs utilize the benefits of QCA programmable crossbar architecture. Namely, the RAM circuit is characterized by regularity and the ability of customization. The features that the proposed RAM design methodology has, allow the designers to use the available circuit area efficiently.

11:05
Integrated Synthesis Methodology for Crossbar Arrays

ABSTRACT. Nano-crossbar arrays have emerged as area and power e cient structures with an aim of achieving high performance computing beyond the limits of current CMOS. Due to the stochastic nature of nano-fabrication, nano arrays show di erent properties both in structural and physical device levels compared to conventional technologies. Mentioned factors introduce random characteristics that need to be carefully considered by synthesis process. For in- stance, a competent synthesis methodology must consider basic technology preference for switching elements, defect or fault rates of the given nano switching array and the variation values as well as their e ects on performance metrics including power, delay, and area. Presented synthesis methodology in this study comprehen- sively covers the all speci ed factors and provides optimization algorithms for each step of the process.

11:25
Minimal Disturbed Bits in Writing Resistive Crossbar Memories

ABSTRACT. Resistive memories are promising candidates for non-volatile memories. Write disturb is one of problems that facing this kind of memories. In this paper, the write disturb problem is mathematically formulated in terms of the bias parameters and optimized analytically. A closed form solution for the optimal bias parameters is calculated. Results are compared with the 1/2 and 1/3 bias schemes showing a significant improvement.

11:45-13:00 Session 9: Novel Hardware Structures & Computation Paradigms
Location: Templar's
11:45
A Recursive Growing & Featuring Mechanism for Nanocomputing Structures

ABSTRACT. The huge amounts of physical possibilities offered by the emerging nanotechnologies must be accompanied, beyond the uniform growing mechanisms supposed by the current serial and/or parallel extensions, by an appropriate structuring mechanism able to support efficiently the increasing functional demands. A recursive growing mechanism is proposed for the upcoming Nano-Era. The current growing mechanism involves only pure quantitative aspects. We consider as mandatory, for the very big sized systems, another mechanism which interleaves the quantitative aspects with the functional ones. Because the computational parallelism is implicit for the big sized systems, the growing mechanism must be supported also by an appropriate computational model. For the current systems we started from gates. For Nano-Era structuring mechanism we will start from cellular automata. The main difference is that for nanoarchitectures the growing mechanism and the featuring mechanism are unified in a unique recursive mechanism.

12:05
Free BDD based CAD of Compact Memristor Crossbars for in-Memory Computing

ABSTRACT. The demise of Moore's law, breakdown of Dennard Scaling, dark silicon phenomenon, process variation, leakage currents and quantum tunneling are some of the hurdles faced in the further advancement of computing systems today. As a result, there is a renewed interest in alternate computing paradigms using emerging nanoelectronic devices. This work uses free binary decision diagrams (FBDDs) for computer-aided design (CAD) of compact memristive crossbars for sneak-path based in-memory computing. The absence of a fixed variable ordering makes FBDDs more compact than their ordered counterpart called reduced ordered binary decision diagrams (ROBDDs). Our design has used the size of the circuit-representation of Boolean functions for selecting different variable orderings along different paths which results in compact FBDDs. We have demonstrated our approach by designing compact crossbars for a four-bit multiplier and other RevLib benchmarks. Our synthesis process yields a 50.1% reduction in area over the previous FBDD-based synthesis for the fourth-output-bit of the multiplier. Overall, our approach has reduced the multiplier area by 20.1%.

12:25
Crosstalk based Fine-Grained Reconfiguration Techniques for Polymorphic Circuits

ABSTRACT. Truly polymorphic circuits, whose functionality/circuit behavior can be altered using a control variable, can provide tremendous benefits in multi-functional system design and resource sharing. For secure and fault tolerant hardware designs these can be crucial as well. Polymorphic circuits work in literature so far either rely on environmental parameters such as temperature, variation etc. or on special devices such as ambipolar FET, configurable magnetic devices, etc., that often result in inefficiencies in performance and/or realization. In this paper, we introduce a novel polymorphic circuit design approach where deterministic interference between nano-metal lines is leveraged for logic computing and configuration. For computing, the proposed approach relies on nano-metal lines, their interference and commonly used FETs, and for polymorphism, it requires only an extra metal line that carries the control signal. In this paper, we show a wide range of crosstalk polymorphic (CT-P) logic gates and their evaluation results. We also show an example of a large circuit that performs both the functionalities of multiplier and sorter depending on the configuration signal. Our benchmarking results are presented in this paper. For CT-P, the transistor count was found to be significantly less compared to other existing approaches, ranging from 25% to 83%. For example, CT-P AOI21-OA21 cell show 83%, 85% and 50% transistor count reduction, and Multiplier-Sorter circuit show  40%, 36% and 28% transistor count reduction with respect to CMOS, genetically evolved, and ambipolar transistor based polymorphic circuits respectively.

12:45
A Novel Analog to Digital Conversion Concept with Crosstalk Computing

ABSTRACT. Analog to Digital Converters (ADCs) are the core components of computing systems forming a link between the external stimuli and digital microprocessor operations. Current CMOS based fast ADCs are difficult to scale due to the reliance on transistor sizing and high voltage operations. They also suffer from high power consumption. In this paper, we introduce a novel ADC design which uses the deterministic signal interference between metal lines as a mechanism for signal conversion. In contrast to CMOS ADCs, our approach uses a simple crosstalk tree network of metal lines to convert sampled analog levels to digital code. Here, the sampled analog signal is passed through a input metal line which is capacitively coupled to a series of metal lines in a tree-like layout, and the coupled voltages on the edge of the tree (the leaves) determine the output. The resolution is dependent on the branches of the tree and coupling strengths in between branches. We show 2-bit and 3-bit ADC implemented through this mechanism at 16n technology node. Our results indicate the possibility of huge power savings with Crosstalk ADCs in comparison to CMOS; for 2-bit and 3-bit ADCs the power consumption was found to be 43.51uW and 96.74uW respectively at 5M Hz sampling frequency

14:30-15:45 Session 10: Energy Effective Architectures
Location: Templar's
14:30
Energy Efficiency of Low Swing Signaling for Emerging Interposer Technologies

ABSTRACT. Interconnects often constitute the major bottleneck in the design process of low power integrated circuits (IC). Although 2.5-D integration technologies support physical proximity, the dissipated power in the communication links remains high. In this work, the additional power savings for interposer-based interconnects enabled by low swing signaling is investigated. The energy consumed by a low swing scheme is, therefore, compared with a full swing solution and the critical length of the interconnect, above which the low swing solution starts to pay off, is determined for diverse interposer technologies. The energy consumption is compared for three different substrate materials, silicon, glass, and organic. Results indicate that the higher the load capacitance of the communication medium is, the greater the energy savings of the low swing circuit are. Specifically, in cases that electrostatic discharge (ESD) protection is required, the low swing circuit is always superior in terms of energy consumption due to the high capacitive load of the ESD circuit, regardless the substrate material and the link length. Without ESD protection, the highest critical length is about 380 μm for glass and organic interposers. To further explore the limits of power reduction from low swing signaling for 2.5-D ICs, the effect of typical interconnect parameters such as width and space on the energy efficiency of low swing communication is evaluated.

14:50
Energy-Efficient 4T-based SRAM Bitcell for Ultra-Low-Voltage Operations in 28 nm 3D CoolCubeTM Technology

ABSTRACT. This paper presents a 4T-based SRAM bitcell optimized both for write and read operations at ultra-low voltage (ULV). The proposed bitcell is designed to respond to the requirements of energy constrained systems, as in the case of most IoT-oriented circuits and applications. The use of 3D CoolCubeTM technology enables the design of a stable 4T SRAM bitcell by using data-dependent back biasing. The proposed bitcell architecture provides a major reduction of the write operation energy consumption compared to a conventional 6T bitcell. A dedicated read port coupled to a virtual GND (VGND) ensures a full functionality at ULV of read operations. Simulation results show reliable operations down to 0.35 V close to six sigma (6 σ) without any assist techniques (e.g. negative bitlines), achieving in worst case corner 300 ns and 125 ns in write and read access time, respectively. A 6x energy consumption reduction compared to a ULV ultra-low-leakage (ULL) 6T bitcell is demonstrated.

15:10
Energy-efficient MFCC extraction architecture in mixed-signal domain for automatic speech recognition
SPEAKER: Qin Li

ABSTRACT. This paper proposes a novel processing architecture to extract Mel-Frequency Cepstrum Coefficients(MFCC) for automatic speech recognition. Inspired by the human ear, the energy-efficient analog-domain information processing is adopted to replace the energy-intensive Fourier Transform in conventional digital-domain. Moreover, the proposed architecture extracts the acoustic features in the mixed-signal domain, which significantly reduces the cost of Analog-to-Digital Converter (ADC) and the computational complexity. We carry out the circuit-level simulation based on 180nm CMOS technology, which shows an energy consumption of 2.4 nJ per frame, and a processing speed of 45.79 us per frame. The proposed architecture achieves 97.2% energy saving and about 6.4x speedup than state of the art. Speech recognition simulation reaches the classification accuracy of 99% using the proposed MFCC features.

15:25
Power Analysis of an mRNA-Ribosome System

ABSTRACT. Energy is the heart to drive any device, such as any machine. As researchers have been trying to perform low energy operations more and more, energy requirements are turning out to be one of the key features in measuring the performance of a device. On the other hand, as conventional silicon-based computing is approaching a barrier, needs of non-conventional computing is increasing. Though several such computing platforms have arisen to prove itself as a suitable alternative to silicon-based computing, less energy requirement is certainly one of the most sought features in the competition among the new platforms. Moreover, there are certain scenarios where performing calculations in pure bio-molecular ways are highly desired. Although DNA computing has already flagged the success of bio-molecular computing in terms of energy/power requirement, its manual nature keeps it behind from other computing techniques. Another new bio-molecular computing technique \emph{Ribosomal Computing}, though still in infancy, has shown real promises due to its inherent automation. This work performs an analysis of the energy/power requirements of this computing technique. With the promising result obtained, ribosomal computing can claim itself as a promising computing technique, if combined with its inherent automation.

16:15-17:35 Session 11: Quantum Device based Computing & Architectures
Location: Templar's
16:15
Controlling distilleries in fault-tolerant quantum circuits: problem statement and analysis towards a solution

ABSTRACT. The failure susceptibility of the quantum hardware will force quantum computers to execute fault-tolerant quantum circuits. These circuits are based on quantum error correcting codes, and there is increasing evidence that one of the most practical choices is the surface code. Design methodologies of surface code based quantum circuits were focused on the layout of such circuits without emphasizing the reduced availability of hardware and its effect on the execution time. Circuit layout has not been investigated for practical scenarios, and the problem presented herein was neglected until now. For achieving fault-tolerance and implementing surface code based computations, a significant amount of computing resources (hardware and time) are necessary for preparing special quantum states in a procedure called distillation. This work introduces the problem of how distilleries (circuit portions responsible for state distillation) influence the layout of surface code protected quantum circuits, and analyses the trade-offs for reducing the resources necessary for executing the circuits. A first algorithmic solution is presented, implemented and evaluated for addition quantum circuits.

16:35
Signal Synchronization in Large Scale Quantum-dot Cellular Automata Circuits

ABSTRACT. Quantum-dot fabrication and characterization is a well-established nanotechnology, which used in photonics, quantum optics and nanoelectronics. By placing four quantum-dots on the corners of a square, a cell is formed, in which a bit of information can be stored. This cell serves as the structural device of Quantum-dot Cellular Automata (QCA) nanoelectronic circuits. After QCA presentation, several circuits have been designed and proposed in the literature. However, one of the biggest problems QCA designers have to face to pave the successful design of functional and large scale QCA circuits is signal synchronization. In this paper, a novel approach of the aforementioned problem is presented. This approach is inspired by the well known computational problem of Firing Squad Synchronization (FSS). FSS problem has many similarities with large scale QCA circuits synchronization problem. In addition, FSS problem has been studied by many researchers and many efficient solutions have been proposed in the literature.

16:55
Size Optimization of MIGs with an Application to QCA and STMG Technologies
SPEAKER: Heinz Riener

ABSTRACT. Majority-inverter graphs (MIGs) are a logic representation with remarkable algebraic and Boolean properties that enable efficient logic optimizations beyond the capabilities of traditional logic representations. Further, since many nano-emerging technologies, such as quantum-dot cellular automata (QCA) or spin torque majority gates (STMG), are inherently majority-based, MIGs serve as a natural logic representation to map into these technologies. So far, MIG optimization methods predominantly target to reduce the depth of the logic networks, corresponding to low delay implementations in the respective technologies. In this paper, we introduce several methods to optimize the size of MIGs. They can be applied such that the depth of the logic network is preserved; therefore our methods have a direct effect on the physical area, without worsening the delay. Some methods are inspired by existing size optimization algorithms for non-majority-based logic networks, others make explicit use of the majority function and its properties. All methods are Boolean-in contrast to algebraic optimization methods-which has a positive effect on the quality but challenges their implementation. Our experiments show that using our methods the size of MIGs in the EPFL combinational benchmark suite can be reduced by up to 7.12\%. When mapped to QCA and STMG technologies we reduce the average area-delay-energy product by 2.31% and 2.07%, respectively.

17:15
Representation of Qubit States using 3D Memristance Spaces: A first step towards a Memristive Quantum Simulator

ABSTRACT. Development of quantum simulators is a major step towards the universal quantum computer. Quantum simulators are quantum systems that can perform specific quantum computations, or software packages that can reproduce most of the aspects of a general universal quantum computer on a general-purpose classical computer. Development of quantum simulators using digital circuits, such as FPGAs is very difficult, mainly because the unit of quantum information, the qubit, has an infinite number of states, whereas the classical bit has only two. On the other hand, analog circuits comprising R, L and C elements have no internal state variables that can be used to reproduce and store qubit states. Here we take the first step towards the development of a new quantum simulator using memristors. The qubit state is mapped to a 3D space spanned by the memristances of three identical memristors. The qubit state evolution is reproduced by the input voltages applied to the memristors. We define the correspondence between the general qubit state rotation, i.e. the one-qubit quantum gates, and memristor input voltage variations and reproduce the rotations imposed by the action of quantum gates in the 3D memristance space. Our results show that, at least in principle, qubits and one-qubit quantum gates can be simulated by memristors.