View: session overviewtalk overview
Registration & Breakfast (Sponsored by Vaire Computing: vaire.co)
IRDS @ ICRC:Regional Greetings
- Enrico Sangiorgi, Francis Balestra, Yoshiro Hayashi, and Tom Conte
IRDS @ ICRC:The Dawn of the New Electronics Industry
- Paolo Gargini
Located in Amici (4th floor)
IRDS @ ICRC:Beyond CMOS and Emerging Materials Integration (EMI)
- Shamik Das and Joe Hagmann
ICRC 2024: Welcome!
- Joseph Friedman & Christopher Bennett
ICRC Technical Session 1: Novel Methods for In-memory Computing
11:10 | Compressed vector-matrix multiplication for Memristor-based ensemble neural networks ABSTRACT. Using an ensemble of neural networks is an effective means of quantifying the uncertainty of an output prediction. However, the memory cost of storing a large ensemble of neural networks quickly becomes prohibitive and limits their applicability. This paper details a three-stage in-memory computing circuit that performs analog-domain vector-matrix multiplication between an input voltage vector and a rank-1 compressed weight ensemble stored in the conductances of three Memristor arrays. For wide layers (thousands of neurons) and large ensemble sizes (hundreds to thousands of models), this circuit reduces the required number of Memristors by between two and three orders of magnitude relative to a non-compressed ensemble. Similarly, compared to a single neural network, the increase in the number of Memristors may be less than two-fold. We report SPICE simulations of the circuit and observe that 75% of total error does not deviate by more than 25% of the ideal value. A statistical analysis of the circuit explains these observations and offers insights regarding how the circuit may be improved. |
11:30 | Forward-Forward Learning on RRAM: Algorithm and Low-Voltage Reset Co-Optimization ABSTRACT. Analog RRAM provides important energy savings for on-chip inference. Yet, edge applications such as medical sensors and predictive maintenance require on-chip learning as well. However, backpropagation poses great challenges due to its non-local nature and the need for precise updates. In 2022, Geoffrey Hinton introduced the “Forward-Forward” algorithm to address these challenges. This local learning algorithm, where neurons learn to differentiate "positive" and "negative" inputs, requires minimal data movement. Hinton predicted that the Forward-Forward algorithm would be robust against the imperfections of memristors. We recently introduced three variations of the original Forward-Forward algorithm specifically adapted for high energy efficiency learning with RRAM, and experimentally validated the approach using a dedicated prototype in-memory computing platform equipped with 8k RRAM cells employing a TiN/HfOx/Ti/TiN stack. |
11:40 | HiAER-Spike: Hardware-Software Co-Design for Large-Scale Reconfigurable Event-Driven Neuromorphic Computing ABSTRACT. In this work, we present HiAER-Spike, a modular, reconfigurable, event-driven neuromorphic computing platform designed to execute large spiking neural networks with up to 160 million neurons and 40 billion synapses - roughly twice the neurons of a mouse brain at faster-than real-time. This system, which is currently under construction at the UC San Diego Supercomputing Center, comprises a co-designed hard- and software stack that is optimized for run-time massively parallel processing and hierarchical address-event routing (HiAER) of spikes while promoting memory-efficient network storage and execution. Our architecture efficiently handles both sparse connectivity and sparse activity for robust and low-latency event-driven inference for both edge and cloud computing. A Python programming interface to HiAER-Spike, agnostic to hardware-level detail, shields the user from complexity in the configuration and execution of general spiking neural networks with virtually no constraints in topology. The system is made easily available over a web portal for use by the wider community. In the following we provide an overview of the hard- and software stack, explain the underlying design principles, demonstrate some of the system's capabilities and solicit feedback from the broader neuromorphic community. |
12:00 | StoX-Net: Stochastic Processing of Partial Sums for Efficient In-Memory Computing DNN Accelerators PRESENTER: Ethan G. Rogers ABSTRACT. Crossbar-based in-memory computing (IMC) has emerged as a promising platform for hardware acceleration of deep neural networks (DNNs). However, the energy and latency of IMC systems are dominated by the large overhead of the peripheral analog-to-digital converters (ADCs). To address such ADC bottleneck, here we propose to implement stochastic processing of array-level partial sums (PS) for efficient IMC. Leveraging the probabilistic switching of spin-orbit torque magnetic tunnel junctions, the proposed PS processing eliminates the costly ADC and achieves significant improvement in energy and area efficiency. To mitigate accuracy loss, we develop PS-quantization-aware training that enables backward propagation across stochastic PS. Furthermore, a novel scheme with an inhomogeneous sampling length of the stochastic conversion is proposed. When running ResNet20 on the CIFAR-10 dataset, our architecture-to-algorithm co-design demonstrates up to 16x, 8x, and 10x improvement in energy, latency, and area, respectively, compared to IMC with standard ADC. Our optimized design configuration using inhomogeneous sampling of stochastic PS achieves 130x (24x) improvement in Energy-Delay-Product compared to IMC with full precision ADC (sparse low-bit ADC), while maintaining near-software accuracy at various benchmark classification tasks. |
Above Ash (16th floor)
Special Session: New Frontiers in Neuromorphic Computing- How Can we Compute, and at What Scales?
Located in Amici (4th floor)
ICRC Technical Session 2: Ising Machines & Optimization with Emerging Devices
15:25 | Towards High-Order Ising Machine Accelerators and SAT solvers with In-Memory Computing ABSTRACT. High-order Combinatorial Optimization Problems (COPs) are widespread and arise in circuit routing, protein folding, electronic structure prediction and other areas in operations research. Iterative gradient computation, one of the core methods to solve such problems, is a computationally intensive task due to the large number of matrix-vector multiplications involved. This makes solving large-scale instances of COPs on conventional computers intractable. In this work, we propose an approach for massively parallel gradient calculations of high-degree polynomials, which is conducive to efficient mixed-signal in-memory computing circuit implementations and whose area scales proportionally with the product of the number of variables and terms in the function and, most importantly, independent of its degree. |
15:35 | Gradient matching of higher order combinatorial optimization in quadratic Ising machines PRESENTER: Dmitrii Dobrynin ABSTRACT. The key limitation of physics-inspired hardware accelerators such as Ising machines (IM) or Hopfield neural networks (HNN) is their lack of support of higher order couplings of spins/neurons. As a result, many classes of combinatorial optimization problems featuring polynomial interactions (PUBO) require quadratic hardware embeddings, QUBO mappings. Such embeddings were shown to risk severe performance reduction due to mismatched configuration space features (energy landscape) compared to the native space formulation. Here, we propose a gradient matching (GM) algorithm which leverages existing IM or in-memory HNN architectures and approximately recovers the native energy landscape. We show an exponential scaling advantage of the GM algorithm time-to-solution. Finally, we propose a hardware design to physically realize our GM in an HNN-based solver. |
15:55 | Architectural Considerations for Scalable Analog Ising Machines PRESENTER: Pranav Mathews ABSTRACT. Analog circuits can be used to efficiently implement Ising machines, a class of recurrent neural networks that gives good solutions to NP-hard problems. However, designers are faced with numerous choices when deciding how to implement the activation function, weighted connections, and network architecture of an analog Ising machine. This paper explores the tradeoffs between different Ising machine architectural choices (the activation function, weighted connections, and network architecture) and discusses the best combination of choices to solve different NP-hard problems. |
16:15 | Optimization of Magnetic Tunneling Junction Devices for Neuromorphic Circuits for Solving MAXCUT ABSTRACT. Novel algorithms leveraging neuromorphic computation are on the forefront of algorithm design. Here, we investigate how stochastic devices integrate and perform with a novel neuromorphic algorithm for solving MAXCUT problems in graphs. We evaluate how using magnetic tunneling junctions (MTJs) as the device to generate random numbers impacts the neuromorphic MAXCUT algorithm. We use both experimental MTJ data, as well as a model of the device behavior to investigate MTJ performance on this task. We also leverage the use of evolutionary optimization to tune the MTJ device to maximize performance on the algorithm and minimize energy usage of the device. |
Located in Amici (4th floor)
ICRC Technical Session 3: Reservoir & Stochastic Computing
16:50 | Physical reservoir computing on discrete analog CMOS circuits and its application to real data analysis and prediction ABSTRACT. This report describes popular benchmark tasks \& time series prediction with real-world temporal datas by using physical reservoir computing device. We use the physical reservoir computing consisting of analog electronic circuits, and run time series predection on simulation and the device. |
17:00 | Computing with a Chemical Reservoir ABSTRACT. Scientific computing, data analytics and artificial intelligence (in particular with the proliferation of large language models) are driving an explosive growth in computing needs. However, leading-edge high-performance computing systems composed of digital CMOS-based processing elements are reaching physical limits that do not allow anymore significant gains in energy efficiency. As we progress towards post-exascale computing systems disruptive approaches to break this barrier in energy efficiency are required. Novel analog and hybrid digital-analog systems promise improvements in energy efficiency of orders of magnitude. Among the various solutions under exploration, biochemical computing has the potential to enable a new type of computing devices with immense computational power. These device can harness the efficiency of biological cells in solving optimization problems (chemical reactions naturally converge to optimal steady states) and are scalable by considering increasingly larger reaction systems or vessels, potentially meeting the high-performance requirements of scientific computing. However, several theoretical and practical limitations still exists, going from how we formulate and map problems to chemical reaction networks (CRNs) to how we should implement actual chemical reaction computing devices. In this paper, we propose a framework for chemical computation using biochemical systems and present initial components of our approach: an abstract chemical reaction dialect, implemented as a multi-level intermediate representation (MLIR) compiler extension, and a path for representing mathematical problems with CRNs. We demonstrate the potential of this approach by emulating a simplified chemical reservoir device. This work lays the foundation for leveraging chemistry's computing power to create energy-efficient, high-performance computing systems for contemporary computing needs. |
17:20 | Accelerating PDEs with Chiplet-Based Processing Chains ABSTRACT. Innovative accelerator architectures aim to play a critical role in future performance improvements under ceilings imposed by the end of Moore’s Law. Analog mesh computers are a class of such accelerators, designed to minimize time-to-solution by solving partial differential equations in one shot. However, the limited programmability of analog mesh computers does not support the PDE-solver requirement to match arbitrary PDE mesh shapes. In this work, we introduce a chiplet-based architecture capable of solving arbitrary PDE mesh shapes by chaining neural network acceleration chiplets and analog mesh computers. Specifically, we use physics-informed neural networks to infer the values at the perimeter of the analog mesh computer, and then use the analog mesh computer to solve for the remainder of the PDE. We then investigate resource scheduling strategies for the chiplet-based PDE acceleration architecture. Additionally, we propose a figure of merit that enables comparisons between classes of PDE accelerators. We show that the chiplet-based accelerator shows a speedup of 2x when compared to existing solutions. |
17:40 | Thermodynamic Bayesian Inference PRESENTER: Maxwell Aifer ABSTRACT. A fully Bayesian treatment of complicated predictive models (such as deep neural networks) would enable rigorous uncertainty quantification and the automation of higher-level tasks including model selection. However, the intractability of sampling Bayesian posteriors over many parameters inhibits the use of Bayesian methods where they are most needed. Thermodynamic computing has emerged as a paradigm for accelerating operations used in machine learning, such as matrix inversion, and is based on the mapping of Langevin equations to the dynamics of noisy physical systems. Hence, it is natural to consider the implementation of Langevin sampling algorithms on thermodynamic devices. In this work we propose electronic analog devices that sample from Bayesian posteriors by realizing Langevin dynamics physically. Circuit designs are given for sampling the posterior of a Gaussian-Gaussian model and for Bayesian logistic regression, and are validated by simulations. It is shown, under reasonable assumptions, that the Bayesian posteriors for these models can be sampled in time scaling with $\ln(d)$, where $d$ is dimension. For the Gaussian-Gaussian model, the energy cost is shown to scale with $ d \ln(d)$. These results highlight the potential for fast, energy-efficient Bayesian inference using thermodynamic computing. |
Poster Session & Happy Hour
Confirmed Poster Participants:
- “Accurate and Efficient Reservoir Computing with a Multifunctional Proton-Copper ECRAM”, Caroline Smith, ASU, Tempe AZ, USA
- "SPICEPilot: Navigating SPICE Code Generation and Simulation with AI Guidance," , Deepak Vungalara , New Jersey Institute of Technology, New Jersey, USA
- “An Efficient Convolutional Neural Network Analog Architecture,” Jennifer Hasler, Georgia Tech, Atlanta, GA, USA
- “Compressed vector-matrix multiplication for Memristor-based ensemble neural networks”, Phan Ahn Vu, CEA List, Univ. Grenoble, Grenoble, France
- "HiAER-Spike: Hardware-Software Co-Design for Large-Scale Reconfigurable Event-Driven Neuromorphic Computing", Omowuyi Olajide, University of California, San Diego, CA, USA
- "Physical reservoir computing on discrete analog CMOS circuits and its application to real data analysis and prediction", Shimon Matsuno, Hokkaido University, Japan
- "Accelerating PageRank Algorithmic Tasks with a New Programmable Hardware Architecture", Rownak Chowdhury, University of Missouri-Kansas City, Kansas City, MO, USA
- "Forward-Forward Learning on RRAM: Algorithm and Low-Voltage Reset Co-Optimization", Adrien Renaudineau, Universite Paris-Saclay, Paris, France
- "Filament-Free Bulk RRAM with High Endurance and Long Retention for Few-Shot Learning On-Chip", Yucheng Zhou, Department of NanoEngineering, University of California, San Diego, CA, USA