Program for Tuesday, December 5th

PROGRAM FOR TUESDAY, DECEMBER 5TH

Days:

next day

all days

View: session overview talk overview

08:30-09:00 Session 1: Registration & Breakfast (Day 1)

Location: Amici (4th Floor) | Sponsored By: Vaire Computing (vaire.co)

09:00-10:40 Session 2: IRDS #1

Location: Amici (4th Floor)

Chair:

Joseph S. Friedman

09:00	Paolo Gargini Welcome and IRDS Past, Present, and Future
09:50	Paolo Gargini The Engine of the Chips Acts Around the World

10:40-10:55Coffee Break

Location: Amici (4th Floor)

10:55-12:00 Session 3: IRDS #2

Location: Amici (4th Floor)

Chair:

Joseph S. Friedman

10:55	Hiro Akinaga Beyond CMOS/Emerging Materials Integration with ties to ESHS and Energy Harvesting
11:25	Matthew Marinella Beyond CMOS Technology Issues and Critical Considerations
11:55	Paolo Gargini Closing Remarks

12:00-13:00Lunch Break & Registration

Location: Above Ash (16th Floor)

13:00-14:00 Session 4: Keynote (Day 1)

Location: Amici (4th Floor)

Chair:

Prasanna Date

13:00

Jeffrey Vetter

Deep Codesign in the Post-Exascale Computing Era

ABSTRACT. DOE has just deployed its first Exascale system at ORNL, so now is an appropriate time to revisit our Exascale predictions from over a decade ago and think about post-Exascale. We are now seeing a Cambrian explosion of new technologies during this ‘golden age of architectures,’ making codesign of architectures with software and applications more critical than ever. In this talk, I will revisit Exascale trajectory, survey post-Exascale technologies, and discuss their implications for both system design and software. As an example, I will describe Abisko, a new microelectronics codesign project, that focuses on designing a chiplet for analog spiking neural networks using novel neuromorphic materials.

14:00-14:15Coffee Break

Location: Amici (4th Floor)

14:15-15:55 Session 5: Neuromorphic Computing (Algorithms & Applications)

Location: Amici (4th Floor)

Chair:

Mark Plagge

14:15	James Aimone, William Severa and J. Darby Smith Synaptic Sampling of Neural Networks PRESENTER: James Aimone ABSTRACT. Probabilistic artificial neural networks offer intriguing prospects for enabling the uncertainty of artificial intelligence methods to be described explicitly in their function; however, the development of techniques that quantify uncertainty by well-understood methods such as Monte Carlo sampling has been limited by the high costs of stochastic sampling on deterministic computing hardware. Emerging computing systems that are amenable to hardware-level probabilistic computing, such as those that leverage stochastic devices, may make probabilistic neural networks more feasible in the not-too-distant future. This paper describes the scANN technique---sampling (by coinflips) artificial neural networks---which enables neural networks to be sampled directly by treating the weights as Bernoulli coinflips. This method is natively well suited for probabilistic computing techniques that focus on tunable stochastic devices, nearly matches fully deterministic performance while also describing the uncertainty of correct and incorrect neural network outputs.
14:35	Tristan Sharp, Rishabh Khare, Erick Pederson and Fabio Traversa A Memcomputing Approach to Prime Factorization PRESENTER: Fabio Traversa ABSTRACT. We report preliminary results on using the MEMCPU Platform to compute the prime factorization of large biprimes. The approach described here uses a congruence model that returns smooth congruences to address the bottleneck of standard sieve methods. The model has size-dependent structure, and the MEMCPU Platform requires structure-dependent tuning for optimal performance. Therefore, we tuned the platform on sample problems up to a given size according to available resources. Then we generated RSA-like benchmark biprimes to perform rigorous scaling analysis. The MEMCPU timings over the tuned range followed low degree polynomials in the number of bits, markedly different from other tested methods including general number field sieve. MEMCPU's model was scaled up to 300-bit factorization problems while following a 2nd degree polynomial fit. We also discuss the approach to tuning the MEMCPU Platform for problems beyond the reach of today's most advanced methods. Finally, basic analysis of the acceleration expected from an ASIC implementation is provided and suggests the possibility of real time factorization of large biprimes.
14:55	Mark Plagge, Suma Cardwell and Frances Chance Enhancing Spiking Deep Learning with Dendrites PRESENTER: Mark Plagge ABSTRACT. Deep learning and large-scale neural networks are rapidly transforming the space of computing. While revolutionary in practice, the implementations of these networks are still limited by the end of Dennard scaling and limits of hardware performance scaling. To help solve this problem there has been a great deal of work in developing computationally efficient deep learning techniques, including reducing the total number of MAC operations, leveraging low-energy acceleration hardware devices, and other techniques. In addition to these methods, there is active research in Spiking Neural Network (SNN)s, a category of brain inspired machine learning algorithms that leverage event-driven binary communication connectivity. These networks generally use simple Leaky Integrate-and-Fire (LIF) neuron models which have limited expressivity; SNNs using LIF neurons tend to perform worse than SOTA deep learning models, but have the potential for much higher levels of energy efficiency.
15:05	Lizy K. John, Felipe Franca, Subhasish Mitra, Zachary Susskind, Priscila Lima, Igor Miranda, Eugene B. John, Diego Dutra and Mauricio Breternitz Dendrite-inspired Computing to improve Resilience of Neural Networks to Faults in Emerging Memory Technologies PRESENTER: Zachary Susskind ABSTRACT. Mimicking biological neurons by focusing on the excitatory/inhibitory decoding performed by dendritic trees offers an intriguing alternative to the traditional integrate-and-fire McCullogh-Pitts neuron stylization. Weightless Neural Networks (WNN), which rely on value lookups from tables, emulate the integration process in dendrites and have demonstrated notable advantages in terms of energy efficiency. In this paper, we delve into the WNN paradigm from the perspective of reliability and fault tolerance. Through a series of fault injection experiments, we illustrate that WNNs exhibit remarkable resilience to both transient (soft) errors and permanent faults. Notably, WNN models experience minimal deterioration in accuracy even when subjected to fault rates of up to 5%. This resilience makes them well-suited for implementation in emerging memory technologies for binary or multiple bits-per-cell storage with reduced reliance on memory block-level error resilience features. By offering a novel perspective on neural network modeling and highlighting the robustness of WNNs, this research contributes to the broader understanding of fault tolerance in neural networks, particularly in the context of emerging memory technologies.
15:15	Swagat Bhattacharyya, Linhao Yang and Jennifer Hasler BuzzSort: A Linear-Time, Event-Driven Data Conversion and Sorting Framework for Approximate Computing Architectures PRESENTER: Swagat Bhattacharyya ABSTRACT. Analog computational primitives, such as vector-matrix multipliers (VMMs), are foreseen to play a pivotal role in economizing computing; however, to improve the viability of general-purpose accelerators, there is a need for efficient data conversion and sorting during readout. This work introduces "BuzzSort," an event-driven framework that simultaneously converts and sorts data from analog systems. BuzzSort acquires analog data, retrieves sorting indices, and produces a sorted output vector in linear time. We experimentally demonstrate and characterize the efficacy of BuzzSort with a field-programmable analog array (FPAA) in a 350 nm process and a field-programmable gate array (FPGA).
15:25	Ahna Wurm, Rebecca Seay, Prasanna Date, Shruti R. Kulkarni, Aaron Young and Jeffrey Vetter Arithmetic Primitives for Efficient Neuromorphic Computing PRESENTER: Ahna Wurm ABSTRACT. Neuromorphic computing is steadily gaining popularity in many scientific and engineering disciplines. However, one of the biggest problems that has prevented a more widespread usage of neuromorphic computing is the lack of efficient encoding methods. Traditional encoding methods such as binning, rate encoding, and temporal encoding are based on unary encoding and generate a large number of spikes for certain applications, making them less energy efficient. Lack of better encoding methods has also prevented preprocessing operations from being carried out on neuromorphic computers. As a result, more than 99% of the time can be spent on data preprocessing and data transfer operations in some cases, leading to a highly inefficient workflow. In this paper, we present preliminary results that would enable us to efficiently encode data and perform basic arithmetic operations on neuromorphic computers. First, we present a neuromorphic approach for the two's complement encoding of numbers and leverage it to devise addition and multiplication circuits, which could be used in preprocessing operations on neuromorphic computers. We test our approach on the SuperNeuroMAT simulator. Our results indicate that the two's complement is a highly efficient encoding method in terms of time, space, and energy complexity and that the addition and multiplication circuits produce accurate results on two numbers having arbitrary precision.
15:35	Frances Chance and Suma Cardwell This Is The Way: Routing Neuroscience to Neuromorphic Applications PRESENTER: Frances Chance ABSTRACT. Neuromorphic computing has a long history of drawing inspiration from neuroscience models of single neurons as shown in Fig. 1. However, as advances in neuroscience produce a growing wealth of data detailing the operations of biological neurons and neural circuits, neuromorphic computing has lagged behind when incorporating new understanding of biological neurons. For example, state-of-the-art digital neuromorphic platforms largely rely upon simple but highly scalable leaky integrate-and-fire (LIF) neuron models, yet is it widely known that biological neurons are much more complex. We believe that next generation paradigms in neuromorphic computing will require approaches that draw thoughtful and intentional inspiration from biological systems at a more holistic level.
15:45	Chinamu Kawano and Masanori Hashimoto Performance comparison of memristor crossbar-based analog and FPGA-based digital weight-memory-less neural networks PRESENTER: Chinamu Kawano ABSTRACT. Memristor crossbar arrays have been studied as hardware accelerators for the efficient inference and learning of neural networks, eliminating the need to load network weights from memory.Conversely, FPGAs also facilitate the implementation of weight-memory-less neural networks by embedding weights within the combinational circuit. However, to the best of our knowledge, there have been no studies that quantitatively compare the inference performance of memristor crossbar arrays and FPGAs under identical conditions. In this paper, we examine the inference performance of neural networks implemented with crossbar arrays and FPGAs, focusing on inference latency and energy per inference. Experimental results show that the crossbar-based implementation achieves slightly lower latency and significantly reduces energy consumption compared to the FPGA implementation. Notably, the energy difference ranges from 10.4 to 22.0X in our test scenarios involving small neural networks with layers possessing tens of inputs and outputs.

15:55-16:10Coffee Break

Location: Amici (4th Floor)

16:10-17:50 Session 6: Neuromorphic Computing (Hardware)

Location: Amici (4th Floor)

Chair:

Christopher Bennett

16:10	Salonik Resch, Husrev Cilasun, Masoud Zabihi, Zamshed Chowdhury, Zhengyang Zhao, Jian-Ping Wang, Sachin S. Sapatnekar and Ulya Karpuzcu PimCity: A Compute in Memory Substrate featuring both Row and Column Parallel Computing PRESENTER: Husrev Cilasun ABSTRACT. Processing-in-memory (PIM) substrates perform computation directly within the memory array and feature high levels of parallelism, enabling high performance and energy efficiency. However, these substrates have limitations. They typically only support parallel computing in one dimension of the memory array, where computation is performed along either multiple rows or multiple columns (but not both). This restricts data layout and makes intra-array intermediate data transfers during computation inevitable - which require reads and writes, even for short-distance data movement. Inter-array data transfers become a problem for larger scale algorithms which use more than one PIM array, where data must also be transferred between arrays, incurring large communication overheads and increasing the complexity of the peripheral circuity and interconnection network between arrays. Intermediate data transfers of any form limit scalability and efficiency. In this paper we overcome this limitation and introduce PimCity, a new PIM substrate which features 2D(imensional) compute capability, being able to compute in both the rows and the columns of the memory array. We provide the design details at scale, where computation is distributed over many arrays, giving rise to a tiled architecture. 2D PIM enables intra-array data transfer (memory) operations to be replaced by low-overhead logic inside the memory array. The tiled architecture expands this capability to allow logic operations to be performed across neighboring memory arrays, which in turn enables low-cost inter-array data transfers. The result is a high-performance, energy-efficient, high-density, and scalable PIM architecture which is suitable for both high-performance computing and embedded IoT style applications.
16:30	Kyung Seok Woo, R. Stanley Williams and Suhas Kumar Memristor-Based Dual Conditional In-Memory Computing PRESENTER: Kyung Seok Woo ABSTRACT. With the rapid development of big data, there is a growing need for new computing hardware that can handle complex tasks, which cannot be easily executed with the classical computing based on the von Neumann architecture. Here we demonstrate a Cu0.3Te0.7/HfO2 ion-migration-driven memristor that exhibits a unique resistance switching behavior that allows for the capability of universal Boolean logic. We demonstrate a novel logic scheme specific to the switching behavior of the memristor for all logic operations within three steps using a single device, a first for the bipolar resistive switching device. By constructing a memristive crossbar array, we present more complex operations, such as full adder/subtractor, with the best reported energy efficiency. With the above computing primitives, we demonstrate encryption of user data using unclonable functions via logic operations. This newly proposed computing scheme could be a breakthrough in reducing the area and computational costs of in-memory logic operation.
16:40	Jesse Short, Matthew Spear, Donald Wilson, William Wahby, Tianyao Xiao, Robin Jacobs-Gedrim, Christopher Bennett, Nad Gilbert, Sapan Agarwal and Matthew Marinella Statistical Characterization of ReRAM Arrays for Analog In-Memory Computing PRESENTER: Jesse Short ABSTRACT. A key challenge in developing new memories for analog in memory computing is being able to rapidly characterize statistics across a large number of analog devices. In this paper, we introduce a unique application specific integrated circuit (ASIC) designed for characterizing ReRAM statistics in a crossbar array architecture. Using this platform, a routine is developed to eliminate stuck bits and provide a 100% yield for a TaOx ReRAM memory cell. The platform allows us to characterize the noise and drift across multiple devices. We see that over five minutes the mean value of the weights is highly stable changing by 1.3% or less. Nevertheless, the standard deviation of the weights typically increases more than the mean drift in the weights. Using the measured weight drift we simulate the accuracy of an analog accelerator on the CIFAR-10 dataset and show that a near numeric accuracy of 91.2% percent (ideal numeric is 91.5%) can be achieved at t=0, but that it decreases to 88.6% by 300 seconds.
16:50	Sangheon Oh, Brian Zutter, Timothy D. Brown, Jilian Anderson, Saul Perez Beltran, Sean Bishop, Patrick Finnegan, Anton Ievlev, Joshua Sugar, Suhas Kumar, Elliot J. Fuller, Perla Balbuena, R. Stanley Williams and A. Alec Talin Dynamically Reconfigurable Electrochemical Random Access Memory Device for Neuromorphic Computing PRESENTER: Sangheon Oh ABSTRACT. Neuromorphic computing systems based on analog non-volatile memory devices have drawn a great amount of interests as a pathway toward biological brain-level data processing capability and energy efficiency. However, the lack of reconfigurability in these technologies limits its versatility and makes conventional von Neumann architectures still appealing for neuromorphic computing. In this work, we investigate a vanadium oxide based solid electrochemical random access memory (ECRAM) device as a dynamically reconfigurable component for neuromorphic computing. The ECRAM uses Yttrium-Stabilized Zirconia (YSZ) as a solid electrolyte that conducting oxygen vacancy between the vanadium oxide layers on the gate and the source/drain electrodes. The oxidation state of the VO2 channel can be electrochemically modulated by applying voltage biasing (i.e. 3 V for reduction and –3V for oxidation) between the gate and source/drain electrodes at an elevated temperature (i.e. 250 °C). After ECRAM devices are programmed, the devices shows great retention in its programmed states (i.e. < 1 % loss in channel conductance in 10 years under short circuit condition) at room temperature. This unusually stable data retention is from electrochemically controlled phase coexistence in VO2 layers. As the channel oxidation state of an ECRAM is changed, its insulator to metal (IMT) switching threshold voltage also get tuned. We utilized this reconfigurability of our ECRAM to demonstrate dynamically reconfigurable AND/OR gate. By tuning the switching voltage of one ECRAM, the gate can be configured either as AND or OR gate. The dynamic reconfigurability and stable data retention of our ECRAM provides a solution for versatile and dynamically reconfigurable neuromorphic computing.
17:00	Karan Patel, Suma Cardwell, Catherine Schuman, Jaesuk Kwon, Andrew Maicke, Jared Arzate, Douglas Crowder, J. Darby Smith, Jean Anne Incorvia and James Aimone Device Discovery for Magnetic Tunnel Junctions using Reinforcement Learning PRESENTER: Karan Patel ABSTRACT. In this work, we will present preliminary results using reinforcement learning for device discovery. We leverage a spin-orbit torque (SOT) MTJ model developed in Liu et al. 2022 for our analysis, where we vary key material and device parameters. The reward function accounts for minimizing energy, passing the chi^2 goodness-of-fit test for a given exponential distribution (application task), and a configuration test to check device validity. The observations comprising the environment include the current device parameters, the "score" for the current configuration, and finally the best configuration score discovered so far. AI-guided codesign can alleviate the challenges of inter-disciplinary codesign and accelerate device discovery.
17:10	Disha Maheshwari, Aaron Young, Prasanna Date, Shruti R. Kulkarni, Brett Witherspoon and Narasinga Rao Miniskar An FPGA-based Neuromorphic Processor with All-To-All Connectivity PRESENTER: Disha Maheshwari ABSTRACT. Neuromorphic computing is a promising paradigm for energy-efficient computing in the future. At present, however, it is in its nascent stages---most hardware implementations are research-grade, commercial products are not available, and the software tools are not production-ready. The lack of hardware and software tools makes neuromorphic computing inaccessible to researchers around the globe. To this extent, we intend to build a low-cost, open-source, FPGA-based implementation of a digital neuromorphic processor that can be used by neuromorphic researchers all over the world. In this paper, we present the results from a preliminary implementation of the processor on a Xilinx Artix-7 FPGA using SystemVerilog. Specifically, our implementation supports the integrate-and-fire neuron model with two parameters each for neurons and synapses. Furthermore, it features all-to-all connectivity among all neurons on the hardware. We test our preliminary implementation on four test cases: bars and stripes dataset, shortest path algorithm, logic gates, and 8-3 encoder. We also perform a scalability study to understand the resource utilization of the FPGA as the number of all-to-all connected neurons increases. Our results indicate that with our implementation, the Artix-7 supports 65 neurons with all-to-all connectivity. Moreover, all the test cases mentioned above achieve 100% accuracy.
17:20	Erbin Qiu, Yuan-Hang Zhang, Massimiliano Di Ventra and Ivan K Schuller Reconfigurable functionalities for neuromorphic computing PRESENTER: Erbin Qiu ABSTRACT. Although CMOS technology is widely used, our approach introduces a dynamic system consisting of two thermally linked spiking oscillators based on Mott insulators and utilizing the insulator-to-metal transition. These spiking oscillators, known as neuristors, exhibit a wide range of reconfigurable electrical behaviors similar to biological neurons, including phenomena like the all-or-nothing law, type-II neuronal rate coding law, spike-in and DC out effect, spike-in and spike-out effect, and stochastic leaky integrate-and-firing law. Notably, we implement an inhibitory neuristor by trapping the metallic state, eliminating the need for intricate circuits. Moreover, the same device can achieve both excitatory and inhibitory functionalities by employing different inputs, enhancing its versatility. Importantly, we show the possibility of cascading neural layers through thermal interactions, eliminating the need for complex buffer circuits between layers. This innovative approach simplifies the implementation of reconfigurable cascading neural layers, offering great potential for scalable and energy-efficient thermal neural networks, thereby driving advancements in brain-inspired computing.
17:30	Mehrdad Morsali, Sepehr Tabrizchi, Maximilian Liehr, Nathaniel Cady, Mohsen Imani, Arman Roohi and Shaahin Angizi Deep Mapper: A Multi-Channel Single-Cycle Near-Sensor DNN Accelerator PRESENTER: Mehrdad Morsali ABSTRACT. This work proposes the Deep Mapper, as a new near-sensor resistive accelerator architecture for Deep Neural Networks (DNN) inference that co-integrates the sensing and computing phases of resource-constrained edge devices. Deep Mapper is developed to intrinsically realize highly parallelized multi-channel processing of input frames supported by a new dense hardware-friendly mapping methodology. Our circuit-to-application simulation results on the DNN acceleration task show that Deep Mapper reaches an efficiency of 4.71 TOp/s/W outperforming state-of-the-art near-/in-sensor accelerators.
17:40	Zihan Yin, Gourav Datta, Md Kaiser, Peter Beerel, Ajey Jacob and Akhilesh Jaiswal Design Considerations for 3D Heterogeneous Integration Driven Analog Processing-in-Pixel for Extreme-Edge Intelligence PRESENTER: Zihan Yin ABSTRACT. Given the progress in computer vision, image sensors are broadening their capabilities, which requires adding data processing close to or within the pixel chips. In this context, in-pixel computing has emerged as a notable paradigm, offering the capability to process data within the pixel unit itself. Interestingly, state-of-art in-pixel paradigms rely on high-density 3D heterogeneous integration to establish a per-pixel connection with vertically aligned analog processing units. This article provides a comprehensive review of the most recent developments in in-pixel computing and its relation to 3D heterogeneous integration. It offers an in-depth examination of innovative circuit design, adaptations in algorithms, and the challenges in 3D integration technology for sensor chips, thereby presenting a holistic perspective on the future trajectory of in-pixel computing driven by advances in 3D integration.

18:00-19:00 Reception & Poster Session

Location: Amici (4th Floor)

Poster Info:

Reconfigurable functionalities for neuromorphic computing. Erbin Qiu, Yuan-Hang Zhang, Massimiliano Di Ventra and Ivan K Schuller.
Arithmetic Primitives for Efficient Neuromorphic Computing. Ahna Wurm, Rebecca Seay, Prasanna Date, Shruti R. Kulkarni, Aaron Young and Jeffrey Vetter.
Performance comparison of memristor crossbar-based analog and FPGA-based digital weight-memory-less neural networks. Chinamu Kawano and Masanori Hashimoto.
An FPGA-based Neuromorphic Processor with All-To-All Connectivity. Disha Maheshwari, Aaron Young, Prasanna Date, Shruti R. Kulkarni, Brett Witherspoon and Narasinga Rao Miniskar.
Toward on-chip STDP learning on mixed-signal neuromorphic chips. Ashish Gautam, Takashi Kohno, and Prasanna Date.
Encrypting based on synchronization of 3D chaotic systems on MPSoC. Roberto Herrera-Charles, Opeyemi Afolabi, José Nuñez.-Pérez and Vincent Adeyemi.
A Proof-of-Concept Prototyping of Reservoir Computing with Quantum Dots and an Image Sensor for Image Classification. Takuto Matsumoto, Ryo Shirai and Masanori Hashimoto.
Using non-convex optimization in quantum process tomography: Factored gradient descent is tough to beat. David A. Quiroga and Anastasios Kyrillidis.
On the Comparison of Swap and Entanglement Swapping in Noisy Intermediate Scale Quantum Computers. Jean-Baptiste Waring and Sébastien Le Beux.

Chair:

Prasanna Date