NANOARCH2018: INTERNATIONAL SYMPOSIUM ON NANOSCALE ARCHITECTURES 2018
PROGRAM FOR WEDNESDAY, JULY 18TH
Days:
previous day
next day
all days

View: session overviewtalk overview

10:45-12:05 Session 3: Error Models & Reliability Evaluation
Location: Templar's
10:45
Fast Estimations of Failure Probability Over Long Time Spans

ABSTRACT. Shrinking of device dimensions has undoubtedly enabled the very large scale integration of transistors on electronic chips. However, it has also brought to surface time-zero and time-dependent varia- tion phenomena that degrade system’s performance and threaten functional operation. Hence, the need to capture and describe these mechanisms, as well as effectively model their impact is crucial. To this extent, we follow existing models and propose a complete framework that evaluates failure probability of electronic compo- nents. To assess our framework, a case-study of packet-switched Network on Chip (NoC) routers is presented, studying the failure probability of its SRAM buffers.

11:05
A Probabilistic Error Model and Framework for Approximate Booth Multipliers

ABSTRACT. Approximate computing is a paradigm for high performance and low power design by compromising computational accuracy. In this paper, the structure of an approximate modified radix-4 Booth multiplier is analyzed. A probabilistic error model is proposed to facilitate the evaluation of the approximate multiplier for errors from the approximate radix-4 Booth encoding, the approximate regular partial product array, and the approximate 4-2 compressor. The normalized mean error distances (NMEDs) of 8-bit and 16-bit approximate designs are found by utilizing the proposed model. The results from the error model and the corresponding analytical framework are close to those found by simulation, thus confirming the validity of the proposed approach.

11:25
Variability-Tolerant Memristor-based Ratioed Logic in Crossbar Array

ABSTRACT. The advent of the first TiO2-based memristor in 2008 revived the scientific interest both from academia and industry for this device technology, with several emerging applications including that of logic circuits. Several memristive logic families have been proposed, each with different attributes, in the current quest for energy-efficient computing systems of the future. However, limited endurance of memristor devices and variations (both cycle-to-cycle and device-to-device) are important parameters to be considered in the evaluation of such logic families. In this work we build upon a well-known accurate physics-based model of a bipolar metal-oxide resistive RAM device (supporting parasitics of the device structure and variability of switching voltages and resistance states) and use it to show how performance of memristor-based logic circuits can de degraded owing to both variability and state-drift impact. Based on previous work on CMOS-like memristive logic circuits, we propose a memristive ratioed logic scheme, which is crossbar-compatible, i.e. suitable for in-/near-memory computing, and tolerant to device variability, while also it does not affect the device endurance since computations do not involve switching the memristor states. As a figure of merit, we compare such new logic scheme with MAGIC, focusing on the universal NOR logic gate.

11:45
High-Endurance Bipolar ReRAM-Based Non-Volatile Flip-Flops with Run-Time Tunable Resistive States

ABSTRACT. ReRAM technologies feature desired properties, e.g. fast switching and high read margin, that make them attractive candidates to be used in non-volatile flip-flops (NVFFs). However, they suffer from limited endurance. Therefore, cell degradation considerations are a necessity for practical deployment in non-volatile processors (NVPs). In this paper, we present two bipolar ReRAM-based NVFFs, Hypnos and Morpheus, with enhanced endurance and energy efficiency. Hypnos reduces the ReRAM electrical stress during set operation while keeping the imposed NVFF area overhead at a minimum. In Morpheus, a write-termination circuit is used to further enhance the ReRAM endurance and energy efficiency at the cost of an affordable area overhead. Moreover, both NVFFs feature run-time tunable resistive states to enable on-line adjustment of the trade-off among endurance, retention, energy consumption, and restore success rate (in case of approximate computing). Experimental results demonstrate that Hypnos reduces the ReRAM set degradation by 91%, on average. Moreover, the write-termination mechanism in Morpheus further reduces the remaining degradation by 93%/97% in set/reset operation, on average. The results also demonstrate enhanced energy efficiency in both NVFFs.

12:10-13:00 Session 4: Neural Circuits & Applications
Location: Templar's
12:10
An aging resilient neural network arcitecture

ABSTRACT. Recent artificial neural network architectures use memristors to store synaptic weights. The crossbar structure of memristors is used because of its dense structure and extreme parallelism. Transistor aging impacts their computational accuracy. An enhancement of the memristor-based neural network architecture is introduced using built-in current-based calibration circuit. It is shown experimentally that the proposed approach alleviates the cell aging effect.

12:30
Overcoming Crossbar Nonidealities in Binary Neural Networks Through Learning

ABSTRACT. The crossbar nonidealaties may considerably degrade the accuracy of matrix multiplication operation, which is the cornerstone of hardware accelerated neural networks. In this paper, we show that the crossbar nonidealities especially the wire resistance should be taken into consideration for accurate evaluation. We also present a simple yet highly effective way to capture the wire resistance effect for the inference and training of deep neural networks without extensive SPICE simulations. Different scenarios have been studied and used to show the efficacy of our proposed method.

12:45
Real-Time Trainable Data Converters for General Purpose Applications
SPEAKER: Loai Danial

ABSTRACT. Data converters are ubiquitous in data-abundant systems, where they are heterogeneously distributed across the analog-digital interface. Unfortunately, conventional data converters trade off speed, power, and accuracy. Furthermore, intrinsic real-time and post-silicon variations dramatically degrade their performance. In this paper, we employ novel neuro-inspired approaches to design smart data converters that could be trained in real-time for general purpose applications, using machine learning algorithms and artificial neural network architectures. Our approach integrates emerging memristor technology with CMOS. This concept will pave the way towards adaptive interfaces with the continuous varying conditions of data driven applications.

14:30-15:45 Session 5: Non-CMOS Logic Circuits
Location: Templar's
14:30
Programmable Molecular-Nanoparticle Multi-junction Networks for Logic Operations

ABSTRACT. We propose and investigate a nanoscale multi-junction network architecture that can be configured on-flight to perform Boolean logic functions at room temperature. The device exploits the electronic properties of randomly deposited molecule-interconnected metal nanoparticles, which act collectively as strongly nonlinear single-electron transistors. Disorder is being incorporated in the modeling of their electrical behavior and the collective response of interacting nano-components is being rationalized. The non-optimized energy consumption of the synaptic grid for a "then-if" logical computation is in the range of few aJ.

14:50
Multi-Valued Logic Circuits on Graphene Quantum Point Contact Devices

ABSTRACT. Graphene quantum point contacts (G-QPC) combine switching operations with quantized conductance, which can be modulated by top and back gates. Here we use the conductance quantization to design and simulate multi-valued logic (MVL) circuits and, more specifically an adder. The adder comprises two G-QPCs connected in parallel. We compute the conductance of the adder for various inputs and show that Graphene MVL circuits are feasible.

15:10
Sequential Circuit Design with Bilayer Avalanche Spin Diode Logic

ABSTRACT. Novel computing paradigms like the fully cascadable InSb bilayer avalanche spin-diode logic (BASDL) are capable of performing complex logic operations. Although the original work provides a comprehensive explanation for the device structure, the fundamental logic set and basic combinational circuits, it lacks the inclusion of sequential circuit design. This paper addresses the void by demonstrating the structural design of SR and D-type latches with BASDL. Novel latch topologies are proposed that take full advantage of the BASDL-based logic set while maintaining conventional latch functionality. The effective operation of these latches is verified through a complete logic-level analysis and a brief insight into their physical implementation.

15:25
Complementary Arranged Graphene Nanoribbon-based Boolean Gates

ABSTRACT. With CMOS feature size heading towards atomic dimensions, unjustifiable static power, reliability, and economic implications are exacerbating, prompting for research and development on new materials, devices, and/or computation paradigms. Within this context, Graphene Nanoribbons (GNRs), owing to graphene’s excellent electronic properties, may serve as basic blocks for carbon-based nanoelectronics. In this paper we build upon the fact that GNR behaviour can be controlled according to some desired functionality via top/back gate contacts and propose to combine GNRs with complementary functionalities to construct Boolean gates. To this end, we introduce a generic GNR based Boolean gate structure, composed of two GNRs, i.e., a pull-up GNR performing the gate Boolean function and a pull-down GNR performing the gate inverted Boolean function. Subsequently, by properly adjusting GNRs’ dimensions and topology, we design 2-input AND, NAND, and XOR graphene based Boolean gates, as well as 1-input gates, i.e., inverter and buffer. Our SPICE simulations indicate that the proposed gates exhibit a smaller propagation delay, from 23% for the XOR gate to 6x for the AND gate, and 2 orders of magnitude smaller power consumption, when compared with 7nm CMOS based counterparts, while requiring a 1 to 2 orders of magnitude smaller active area footprint. These results clearly indicate that GNR-based gates have great potential as basic building blocks for future beyond CMOS energy effective nanoscale circuits.

16:15-17:35 Session 6: Advanced Memory Architectures
Location: Templar's
16:15
CCE: A Combined SRAM and Non Volatile Cache for Endurance of Next Generation Multilevel Non Volatile Memories in Embedded Systems

ABSTRACT. In this paper we present Combined Cache for Endurance (CCE), a scheme to enable the use of next generation high density multilevel non volatile memories in embedded systems. These memories are attractive as they can reduce the static power consumption dramatically and a single memory can be potentially used avoiding having both flash and SRAM or DRAM in a system. However, a common drawback of the new multilevel non volatile memories is that they support a limited number of write operations and thus its endurance needs to be improved to make them a viable alternative for the main memory of embedded systems. The proposed CCE relies on the fact that most writes are concentrated on a few addresses. Therefore, a small SRAM cache can be used to store positions that are frequently written. However, this would not preserve the non volatile nature of the memory. To do so, in the proposed CCE, the cache cell has an SRAM part and a non volatile part. At power up the contents of the non volatile part are copied to the SRAM and the other way around at power down. As many embedded systems execute predictable workloads, this cache is statically set to cover the most frequently written addresses. The evaluation shows that CCE can increase the endurance of the memory by several orders of magnitude. At the same time the overheads required to implement the cache are small relative to the main memory. Therefore, CCE can be an interesting option to improve the endurance of next generation high density multilevel non volatile memories

16:35
Regular Expression Matching with Memristor TCAMs for Network Security

ABSTRACT. We propose using memristor-based TCAMs (Ternary Content Addressable Memory) to accelerate Regular Expression (RegEx) matching. RegEx matching is a key function in network security, where deep packet inspection finds and filters out malicious actors. However, RegEx matching latency and power can be incredibly high and current proposals are challenged to perform wire-speed matching for large scale rulesets. Our approach dramatically decreases RegEx matching operating power, provides high throughput, and the use of mTCAMs enables novel compression techniques to expand ruleset sizes and allows future exploitation of the multi-state (analog) capabilities of memristors. We fabricated and demonstrated nanoscale memristor TCAM cells. SPICE simulations investigate mTCAM performance at scale and a mTCAM power model at 22nm demonstrates 0.2 fJ/bit/search energy for a 36x400 mTCAM. We further propose a tiled architecture which implements a Snort ruleset and assess the application performance. Compared to a state-of-the-art FPGA approach (2 Gbps,~1W), we show x4 throughput (8 Gbps) at 60% the power (0.62W) before applying standard TCAM power-saving techniques. Our performance comparison improves further when striding (searching multiple characters) is considered, resulting in 47.2 Gbps at 1.3W for our approach compared to 3.9 Gbps at 630mW for the strided FPGA NFA, demonstrating a promising path to wire-speed RegEx matching on large scale rulesets.

16:55
Ring-Shaped Racetrack Memory: From Circuits and Systems Perspective
SPEAKER: Yue Zhang

ABSTRACT. Information storage and transfer via current-induced domain wall (DW) motions exhibit significant density-speed-energy advantages, which inspire numerous emerging devices and circuits, such as racetrack memory (RM). However, the bi-directional propagation of DWs in the conventional strip-shaped nanowire will lead to data overflow issue, implicitly increasing physical size and deteriorates operational performances. In this paper, we study a ring-shaped RM based on spin orbit torque (SOT) driven chiral DW motions from circuits and systems perspective. Thanks to the circuit simplification and high efficiency of chiral DW motions, its area, operational speed and energy consumption can be greatly improved. Ring-shaped content addressable memory (CAM), as a circuit example, has been designed and simulated. As proved by the 4-core system experiment, the ring-shaped RM cache can improve 39.0% instructions per cycle (IPC) and save 46.0% energy compared with conventional SRAM cache.

17:15
A Novel Cross-point MRAM with Diode Selector Capable of High-Density, High-Speed, and Low-Power In-Memory Computation
SPEAKER: Wang Kang

ABSTRACT. In-Memory Computation (IMC), which is capable of reducing the power consumption and bandwidth requirement resulting from the data transfer between the processing and memory units, has been considered as a promising technology to break the von-Neumann bottleneck. In order to develop an effective and efficient IMC platform, the performance, such as density, operation speed and power consumption, of the memory itself is one of the most important keys. In this work, we report a cross-point magnetic random access memory (MRAM) with diode selector for IMC implementation. The memory cell consists of a magnetic tunnel junction (MTJ) device and a diode connected in series. The memory cells are arranged in a cross-point array structure, providing high storage density. The MTJ can be switched through the unipolar precessional voltage-controlled magnetic anisotropy (VCMA) effect, thus enabling high speed and low power. Further, Boolean logic functions can be realized via regular memory-like write & read operations. The feasibility and performance of the proposed IMC in the cross-point MRAM are successfully demonstrated with hybrid VCMA-MTJ/CMOS circuit simulations under the 40 nm technology node.