Program for Monday, December 16th

11:10	Phan Anh Vu, François Rummens, Marielle Malfante, Bertrand Rivet and Thomas Dalgaty Compressed vector-matrix multiplication for Memristor-based ensemble neural networks ABSTRACT. Using an ensemble of neural networks is an effective means of quantifying the uncertainty of an output prediction. However, the memory cost of storing a large ensemble of neural networks quickly becomes prohibitive and limits their applicability. This paper details a three-stage in-memory computing circuit that performs analog-domain vector-matrix multiplication between an input voltage vector and a rank-1 compressed weight ensemble stored in the conductances of three Memristor arrays. For wide layers (thousands of neurons) and large ensemble sizes (hundreds to thousands of models), this circuit reduces the required number of Memristors by between two and three orders of magnitude relative to a non-compressed ensemble. Similarly, compared to a single neural network, the increase in the number of Memristors may be less than two-fold. We report SPICE simulations of the circuit and observe that 75% of total error does not deviate by more than 25% of the ideal value. A statistical analysis of the circuit explains these observations and offers insights regarding how the circuit may be improved.
11:30	Adrien Renaudineau, Bastien Imbert, Mamadou Hawa Diallo, Jorge-Daniel Aguirre-Morales, Mohammed Akib Iftakher, Kamel-Eddine Harabi, Clément Turck, Marie Drouhin, Tifenn Hirtzlin, Elisa Vianello, Jean-Michel Portal, Marc Boquet and Damien Querlioz Forward-Forward Learning on RRAM: Algorithm and Low-Voltage Reset Co-Optimization ABSTRACT. Analog RRAM provides important energy savings for on-chip inference. Yet, edge applications such as medical sensors and predictive maintenance require on-chip learning as well. However, backpropagation poses great challenges due to its non-local nature and the need for precise updates. In 2022, Geoffrey Hinton introduced the “Forward-Forward” algorithm to address these challenges. This local learning algorithm, where neurons learn to differentiate "positive" and "negative" inputs, requires minimal data movement. Hinton predicted that the Forward-Forward algorithm would be robust against the imperfections of memristors. We recently introduced three variations of the original Forward-Forward algorithm specifically adapted for high energy efficiency learning with RRAM, and experimentally validated the approach using a dedicated prototype in-memory computing platform equipped with 8k RRAM cells employing a TiN/HfOx/Ti/TiN stack.
11:40	Gwenevere Frank, Gopabandhu Hota, Keli Wang, Abhinav Uppal, Omowuyi Olajide, Kenneth Yoshimoto, Leif Gibb, Qingbo Wang, Johannes Leugering, Stephen Deiss and Gert Cauwenberghs HiAER-Spike: Hardware-Software Co-Design for Large-Scale Reconfigurable Event-Driven Neuromorphic Computing ABSTRACT. In this work, we present HiAER-Spike, a modular, reconfigurable, event-driven neuromorphic computing platform designed to execute large spiking neural networks with up to 160 million neurons and 40 billion synapses - roughly twice the neurons of a mouse brain at faster-than real-time. This system, which is currently under construction at the UC San Diego Supercomputing Center, comprises a co-designed hard- and software stack that is optimized for run-time massively parallel processing and hierarchical address-event routing (HiAER) of spikes while promoting memory-efficient network storage and execution. Our architecture efficiently handles both sparse connectivity and sparse activity for robust and low-latency event-driven inference for both edge and cloud computing. A Python programming interface to HiAER-Spike, agnostic to hardware-level detail, shields the user from complexity in the configuration and execution of general spiking neural networks with virtually no constraints in topology. The system is made easily available over a web portal for use by the wider community. In the following we provide an overview of the hard- and software stack, explain the underlying design principles, demonstrate some of the system's capabilities and solicit feedback from the broader neuromorphic community.
12:00	Ethan G. Rogers, Sohan Salahuddin Mugdho, Kshemal K. Gupte and Cheng Wang StoX-Net: Stochastic Processing of Partial Sums for Efficient In-Memory Computing DNN Accelerators PRESENTER: Ethan G. Rogers ABSTRACT. Crossbar-based in-memory computing (IMC) has emerged as a promising platform for hardware acceleration of deep neural networks (DNNs). However, the energy and latency of IMC systems are dominated by the large overhead of the peripheral analog-to-digital converters (ADCs). To address such ADC bottleneck, here we propose to implement stochastic processing of array-level partial sums (PS) for efficient IMC. Leveraging the probabilistic switching of spin-orbit torque magnetic tunnel junctions, the proposed PS processing eliminates the costly ADC and achieves significant improvement in energy and area efficiency. To mitigate accuracy loss, we develop PS-quantization-aware training that enables backward propagation across stochastic PS. Furthermore, a novel scheme with an inhomogeneous sampling length of the stochastic conversion is proposed. When running ResNet20 on the CIFAR-10 dataset, our architecture-to-algorithm co-design demonstrates up to 16x, 8x, and 10x improvement in energy, latency, and area, respectively, compared to IMC with standard ADC. Our optimized design configuration using inhomogeneous sampling of stochastic PS achieves 130x (24x) improvement in Energy-Delay-Product compared to IMC with full precision ADC (sparse low-bit ADC), while maintaining near-software accuracy at various benchmark classification tasks.

12:20-13:20Lunch Break

Above Ash (16th floor)

13:35-15:15 Session 8

Special Session: New Frontiers in Neuromorphic Computing- How Can we Compute, and at What Scales?

Chair:

Matthew Marinella

Location: Amici (4th Floor)

13:35	Christof Teuscher Alternative ways of computing: what computes, what doesn't? ABSTRACT. In 1936, Alan Turing laid the theoretical groundwork for modern computing science by defining what would later become known as a universal Turing machine (UTM). According to the Church-Turing (CT) thesis—a touchstone of modern computing theory—a UTM can carry out any effective computation, or, in other words, it can simulate any other machine capable of performing a well-defined computational procedure. But does a UTM really capture the essence of any and all forms of theoretical and physical computing? Or, are there alternative forms of computing? Forms of computing that are more efficient or more expressive? Let's find out in this presentation!
14:00	Jeff Zhang Reliable Deep Learning Inference: From Hardware to Systems ABSTRACT. To enable more sustainable AI/ML, the development of hardware accelerators and custom systems has significantly enhanced energy efficiency for both training and inferencing. However, faults that may arise from design, manufacturing, or operational factors can impede the deployment and utilization of these advanced hardware and system solutions. This presentation will examine the impact of permanent faults and soft errors on systolic array-based accelerators, alongside effective mitigation techniques. Additionally, we will discuss strategies for achieving high Quality of Service (QoS) inference under fluctuating workloads at the system level.
14:25	Robinson Pino Emerging Computing Technologies for Advanced Scientific Computing Research ABSTRACT. Advances in emerging computing technologies offers potentially new opportunities and capabilities for the advancement of the U.S. Department of Energy (DOE) and Office of Science mission. The DOE Office of Science operates scientific infrastructure, supporting some of the nation’s most advanced intellectual discoveries, spanning the country and including 30 world-class user facilities from supercomputers to accelerators. This talk will present a brief overview of research and development activities in the areas of Artificial Intelligence, Neuromorphic Computing, Microelectronics, and Advanced Wireless at the DOE, Office of Science, Advanced Scientific Computing Research program office.
14:50	Brian Hoskins Developing Platforms for AI Research and Beyond ABSTRACT. Experimental validation of future systems is a critical component the hardware development process, allowing for insights not solely available in theory as well as opportunities to discover to phenomenon not initially envisioned. This talk will cover recent developments and the applications of developing a 2-Terminal memory device test platform and the insights gained from experimental validation of this system, particularly through the use of defect tolerant algorithms and study of underlying device statistics. Based on these positive results, the prospects of developing future systems and vehicles for accelerating R&D across various technology domains will be explored.

15:15-15:25Coffee Break

Located in Amici (4th floor)

15:25-16:35 Session 9

ICRC Technical Session 2: Ising Machines & Optimization with Emerging Devices

Chair:

Jennifer Hasler

Location: Amici (4th Floor)

15:25	Tinish Bhattacharya Towards High-Order Ising Machine Accelerators and SAT solvers with In-Memory Computing ABSTRACT. High-order Combinatorial Optimization Problems (COPs) are widespread and arise in circuit routing, protein folding, electronic structure prediction and other areas in operations research. Iterative gradient computation, one of the core methods to solve such problems, is a computationally intensive task due to the large number of matrix-vector multiplications involved. This makes solving large-scale instances of COPs on conventional computers intractable. In this work, we propose an approach for massively parallel gradient calculations of high-degree polynomials, which is conducive to efficient mixed-signal in-memory computing circuit implementations and whose area scales proportionally with the product of the number of variables and terms in the function and, most importantly, independent of its degree.
15:35	Dmitrii Dobrynin, Mauricio Tedeschi, Arne Heittmann and John Paul Strachan Gradient matching of higher order combinatorial optimization in quadratic Ising machines PRESENTER: Dmitrii Dobrynin ABSTRACT. The key limitation of physics-inspired hardware accelerators such as Ising machines (IM) or Hopfield neural networks (HNN) is their lack of support of higher order couplings of spins/neurons. As a result, many classes of combinatorial optimization problems featuring polynomial interactions (PUBO) require quadratic hardware embeddings, QUBO mappings. Such embeddings were shown to risk severe performance reduction due to mismatched configuration space features (energy landscape) compared to the native space formulation. Here, we propose a gradient matching (GM) algorithm which leverages existing IM or in-memory HNN architectures and approximately recovers the native energy landscape. We show an exponential scaling advantage of the GM algorithm time-to-solution. Finally, we propose a hardware design to physically realize our GM in an HNN-based solver.
15:55	Pranav Mathews and Jennifer Hasler Architectural Considerations for Scalable Analog Ising Machines PRESENTER: Pranav Mathews ABSTRACT. Analog circuits can be used to efficiently implement Ising machines, a class of recurrent neural networks that gives good solutions to NP-hard problems. However, designers are faced with numerous choices when deciding how to implement the activation function, weighted connections, and network architecture of an analog Ising machine. This paper explores the tradeoffs between different Ising machine architectural choices (the activation function, weighted connections, and network architecture) and discusses the best combination of choices to solve different NP-hard problems.
16:15	Ian Mulet, Bradley Theilman, Karan Patel, Jared Arzate, Andrew Maicke, J. Darby Smith, James Aimone, Suma George Cardwell, Jean Anne Incorvia and Catherine Schuman Optimization of Magnetic Tunneling Junction Devices for Neuromorphic Circuits for Solving MAXCUT ABSTRACT. Novel algorithms leveraging neuromorphic computation are on the forefront of algorithm design. Here, we investigate how stochastic devices integrate and perform with a novel neuromorphic algorithm for solving MAXCUT problems in graphs. We evaluate how using magnetic tunneling junctions (MTJs) as the device to generate random numbers impacts the neuromorphic MAXCUT algorithm. We use both experimental MTJ data, as well as a model of the device behavior to investigate MTJ performance on this task. We also leverage the use of evolutionary optimization to tune the MTJ device to maximize performance on the algorithm and minimize energy usage of the device.

16:35-16:50Coffee Break

Located in Amici (4th floor)

16:50-18:00 Session 10

ICRC Technical Session 3: Reservoir & Stochastic Computing

Chair:

Damien Querlioz

Location: Amici (4th Floor)

16:50	Shimon Matsuno, Yuki Abe, Kota Ando and Tetsuya Asai Physical reservoir computing on discrete analog CMOS circuits and its application to real data analysis and prediction ABSTRACT. This report describes popular benchmark tasks \& time series prediction with real-world temporal datas by using physical reservoir computing device. We use the physical reservoir computing consisting of analog electronic circuits, and run time series predection on simulation and the device.
17:00	Connah Johnson, Nicolas Bohm Agostini, William Cannon and Antonino Tumeo Computing with a Chemical Reservoir ABSTRACT. Scientific computing, data analytics and artificial intelligence (in particular with the proliferation of large language models) are driving an explosive growth in computing needs. However, leading-edge high-performance computing systems composed of digital CMOS-based processing elements are reaching physical limits that do not allow anymore significant gains in energy efficiency. As we progress towards post-exascale computing systems disruptive approaches to break this barrier in energy efficiency are required. Novel analog and hybrid digital-analog systems promise improvements in energy efficiency of orders of magnitude. Among the various solutions under exploration, biochemical computing has the potential to enable a new type of computing devices with immense computational power. These device can harness the efficiency of biological cells in solving optimization problems (chemical reactions naturally converge to optimal steady states) and are scalable by considering increasingly larger reaction systems or vessels, potentially meeting the high-performance requirements of scientific computing. However, several theoretical and practical limitations still exists, going from how we formulate and map problems to chemical reaction networks (CRNs) to how we should implement actual chemical reaction computing devices. In this paper, we propose a framework for chemical computation using biochemical systems and present initial components of our approach: an abstract chemical reaction dialect, implemented as a multi-level intermediate representation (MLIR) compiler extension, and a path for representing mathematical problems with CRNs. We demonstrate the potential of this approach by emulating a simplified chemical reservoir device. This work lays the foundation for leveraging chemistry's computing power to create energy-efficient, high-performance computing systems for contemporary computing needs.
17:20	Jeff Anderson and Tarek El-Ghazawi Accelerating PDEs with Chiplet-Based Processing Chains ABSTRACT. Innovative accelerator architectures aim to play a critical role in future performance improvements under ceilings imposed by the end of Moore’s Law. Analog mesh computers are a class of such accelerators, designed to minimize time-to-solution by solving partial differential equations in one shot. However, the limited programmability of analog mesh computers does not support the PDE-solver requirement to match arbitrary PDE mesh shapes. In this work, we introduce a chiplet-based architecture capable of solving arbitrary PDE mesh shapes by chaining neural network acceleration chiplets and analog mesh computers. Specifically, we use physics-informed neural networks to infer the values at the perimeter of the analog mesh computer, and then use the analog mesh computer to solve for the remainder of the PDE. We then investigate resource scheduling strategies for the chiplet-based PDE acceleration architecture. Additionally, we propose a figure of merit that enables comparisons between classes of PDE accelerators. We show that the chiplet-based accelerator shows a speedup of 2x when compared to existing solutions.
17:40	Maxwell Aifer, Samuel Duffield, Kaelan Donatella, Denis Melanson, Phoebe Klett, Zach Belateche, Gavin Crooks, Antonio Martinez and Patrick Coles Thermodynamic Bayesian Inference PRESENTER: Maxwell Aifer ABSTRACT. A fully Bayesian treatment of complicated predictive models (such as deep neural networks) would enable rigorous uncertainty quantification and the automation of higher-level tasks including model selection. However, the intractability of sampling Bayesian posteriors over many parameters inhibits the use of Bayesian methods where they are most needed. Thermodynamic computing has emerged as a paradigm for accelerating operations used in machine learning, such as matrix inversion, and is based on the mapping of Langevin equations to the dynamics of noisy physical systems. Hence, it is natural to consider the implementation of Langevin sampling algorithms on thermodynamic devices. In this work we propose electronic analog devices that sample from Bayesian posteriors by realizing Langevin dynamics physically. Circuit designs are given for sampling the posterior of a Gaussian-Gaussian model and for Bayesian logistic regression, and are validated by simulations. It is shown, under reasonable assumptions, that the Bayesian posteriors for these models can be sampled in time scaling with $\ln(d)$, where $d$ is dimension. For the Gaussian-Gaussian model, the energy cost is shown to scale with $ d \ln(d)$. These results highlight the potential for fast, energy-efficient Bayesian inference using thermodynamic computing.

18:00-19:00 Session 11

Poster Session & Happy Hour

Confirmed Poster Participants:

“Accurate and Efficient Reservoir Computing with a Multifunctional Proton-Copper ECRAM”, Caroline Smith, ASU, Tempe AZ, USA
"SPICEPilot: Navigating SPICE Code Generation and Simulation with AI Guidance," , Deepak Vungalara , New Jersey Institute of Technology, New Jersey, USA
“An Efficient Convolutional Neural Network Analog Architecture,” Jennifer Hasler, Georgia Tech, Atlanta, GA, USA
“Compressed vector-matrix multiplication for Memristor-based ensemble neural networks”, Phan Ahn Vu, CEA List, Univ. Grenoble, Grenoble, France
"HiAER-Spike: Hardware-Software Co-Design for Large-Scale Reconfigurable Event-Driven Neuromorphic Computing", Omowuyi Olajide, University of California, San Diego, CA, USA
"Physical reservoir computing on discrete analog CMOS circuits and its application to real data analysis and prediction", Shimon Matsuno, Hokkaido University, Japan
"Accelerating PageRank Algorithmic Tasks with a New Programmable Hardware Architecture", Rownak Chowdhury, University of Missouri-Kansas City, Kansas City, MO, USA
"Forward-Forward Learning on RRAM: Algorithm and Low-Voltage Reset Co-Optimization", Adrien Renaudineau, Universite Paris-Saclay, Paris, France
"Filament-Free Bulk RRAM with High Endurance and Long Retention for Few-Shot Learning On-Chip", Yucheng Zhou, Department of NanoEngineering, University of California, San Diego, CA, USA

Chairs:

Joseph S. Friedman and Christopher Bennett

Location: Amici (4th Floor)