Program for Monday, December 18th

PROGRAM FOR MONDAY, DECEMBER 18TH

Days:

next day

all days

View: session overview talk overview

08:30-09:00 Registration & Welcome Address

Chair:

Ronald Tetzlaff

09:00-10:15 Session 1: Keynote Speaker: Subhasish Mitra

Chair:

Ronald Tetzlaff

Location: Dülfer Hall

09:00

Subhasish Mitra

The Future of Hardware Technologies for Computing

ABSTRACT. The computation demands of 21st-century abundant-data workloads, such as AI/machine learning, far exceed the capabilities of today’s computing systems. For example, a Dream AI Chip would ideally co-locate all memory and compute on a single chip, quickly accessible at low energy. Such Dream Chips aren’t realizable today. Computing systems instead use large off-chip memory and spend enormous time and energy shuttling data back and forth. This memory wall gets worse with growing problem sizes, especially as conventional transistor miniaturization gets increasingly difficult. The next leap in computing requires transformative NanoSystems by exploiting the unique characteristics of nanotechnologies and abundant-data workloads. We create new chip architectures through ultra-dense 3D integration of logic and memory – the N3XT 3D approach. Multiple N3XT 3D chips are integrated through a continuum of chip stacking/interposer/wafer-level integration — the N3XT 3D MOSAIC. To scale with growing problem sizes, new Illusion systems orchestrate workload execution on N3XT 3D MOSAIC creating an illusion of a Dream Chip with near-Dream energy and throughput. Several hardware prototypes, built in commercial and research fabrication facilities, demonstrate the effectiveness of our approach. We target 1,000X system-level energy-delay-product benefits, especially for abundant-data workloads. We also address new ways of ensuring robust system operation despite growing challenges of design bugs, manufacturing defects, reliability failures, and security attacks.

10:15-10:30Coffee Break

10:30-12:36 Session 2: Advanced Computing Architectures and Systems

Chair:

Ioannis Messaris

Location: Dülfer Hall

10:30	Jiaming Li, Bin Gao, Ruihua Yu, Peng Yao, Jianshi Tang, He Qian and Huaqiang Wu A Spatial-Designed Computing-In-Memory Architecture Based on Monolithic 3D Integration for High-Performance Systems PRESENTER: Jiaming Li ABSTRACT. The computing-in-memory (CIM) technology effectively addresses the bottleneck of data movement in traditional von-Neumann architecture, especially for deep neural network (DNN) acceleration. However, with the improving performance and parallelism of CIM processing elements (PEs), the substantial latency and power overhead caused by high-density intermediate results transmission has become a new bottleneck in CIM architectures. In this paper, we propose a spatial-designed CIM architecture based on the emerging Monolithic 3D (M3D) technology, and a spatiality-aware DNN mapping method for high-performance CIM systems. The proposed architecture introduces a novel hierarchy by implementing staggered tiers, enabling PEs to be shared by multiple tiles, and uses the ultra-dense and lower-power Inter-Layer Vias (ILVs) in M3D as shared buses, enabling CIM PEs to exploit the ultra-high bandwidth of M3D for inter-tile and intra-tile data transfer. Our experiment result shows that the proposed M3D-enabled CIM architecture, combined with the proposed mapping method, achieves a 6.52x latency improvement, a 40.84x interconnection energy-delay product (EDP) improvement, and a 7.62x system-level EDP improvement compared to state-of-the-art CIM architecture.
10:48	Jan Drewniok, Marcel Walter and Robert Wille Minimal Design of SiDB Gates: An Optimal Basis for Circuits Based on Silicon Dangling Bonds PRESENTER: Jan Drewniok ABSTRACT. Silicon Dangling Bonds (SiDBs) present a promising computational technology that goes beyond traditional CMOS. It enables the creation of circuitry using single atoms as elementary components. Since current computational technologies approach their physical limits, SiDBs have attracted significant interest from both academia and industry. More precisely, single SiDBs allow for realizing Boolean functionality. They form gates which, then, are utilized as fundamental building blocks to realize arbitrary circuit logic. However, although, fabrication capabilities are advancing rapidly and initial design automation methodologies have been proposed, the current design of these gates is primarily based on manual methods. This paper presents an approach capable of designing SiDB gates using the minimum number of SiDBs possible for a given Boolean function, and thus, minimizing gate cost. In addition to the guaranteed minimality, this allows to design SiDB gates, which require significantly fewer SiDBs compared to gate designs currently used in the state of the art. This breakthrough simplifies SiDB circuit designs and their corresponding manufacturing processes significantly, thereby accelerating the progress of this promising technology.
11:06	Simon Hofmann, Marcel Walter and Robert Wille Post-Layout Optimization for Field-Coupled Nanotechnologies PRESENTER: Simon Hofmann ABSTRACT. While conventional computing technologies reach their limits, the demand for computation power keeps growing, fueling the interest in post-CMOS technologies. One promising contestant in this domain is Field-coupled Nanocomputing (FCN), which conducts computations based on the repulsion of physical fields at the nanoscale. However, to realize a dedicated functionality in this technology design methods are needed that create corresponding FCN layouts. While several methods for FCN layout generation have been proposed in the past, the underlying complexity requires them to resort to heuristic approaches—leading to results of sub-par quality and offering room for improvement. In conventional CMOS design, post-layout optimization methods are available to exploit this potential for further improvement. Unfortunately, no such methods exists yet for FCN. In this work, we are addressing this gap and introduce the first post-layout optimization approach for FCN. Experimental evaluations show the benefits of the approach: Applied to layouts generated by two complementary state-of-the-art methods, the proposed post-layout optimization allows for a further area reduction of 50.79 % and 20.00 % on average, respectively—confirming the potential of post-layout optimization for FCN.
11:24	Saad Saleh and Boris Koldehofe Memristor-based Network Switching Architecture for Energy Efficient Cognitive Computational Models PRESENTER: Saad Saleh ABSTRACT. The Internet makes use of high performance network switches in-order to route network traffic from end users to servers. Despite line-rate performance, the current switches consume huge energy and lack the ability to support more expressive learning models, like neuromorphic functions. The major reason is the use of transistors in the underlying Ternary Content Addressable Memory (TCAM) which is volatile and supports digital computations only. These shortcomings can be bypassed by developing network memories building on novel components, like Memristors, due to their nonvolatile, nanoscale and analog storage/processing characteristics. In this paper, we propose the use of a novel memristor-based pCAMCogniGron memory which provides both digital (deterministic) and analog (probabilistic) outputs for supporting cognitive computational models in network switches. The traditional digital operations can still be supported by a memristor-based energy efficient TCAM, called TCAmMCogniGron. Building on pCAMCogniGron and TCAmMCogniGron, we propose a novel network switching architecture and analyze its energy efficiency over the experimental dataset of a Nb-doped SrTiO3 memristive device. The results show that the proposed network switching architecture consumes only 0.01 fJ/bit/cell energy for analog compute operations which is 50 times less than the transistor-based TCAM.
11:42	Max Uhlmann, Tommaso Rizzi, Jianan Wen, Emilio Pérez-Bosch Quesada, Bakr Al Beattie, Karlheinz Ochs, Eduardo Pérez, Philip Ostrovskyy, Corrado Carta, Christian Wenger and Gerhard Kahmen LUT-based RRAM Model for Neural Accelerator Circuit Simulation PRESENTER: Max Uhlmann ABSTRACT. Neural hardware accelerators have been proven to be energy-efficient when used to solve tasks which can be mapped into an artificial neural network (ANN) structure. Resistive random-access memories (RRAMs) are currently under investigation together with several different memristive devices as promising technologies to build such accelerators combined together with complementary metal-oxide semiconductor (CMOS)-technologies in integrated circuits (ICs). While many research groups are actively developing sophisticated physical-based representations to better understand the underlying phenomena characterizing these devices, not much work has been dedicated to exploit the trade-off between simulation time and accuracy in the definition of low computational demanding models suitable to be used at many abstraction layers. Indeed, the design of complex mixed-signal systems as a neural hardware accelerators requires frequent interaction between the application- and the circuit-level that can be enabled only with the support of accurate and fast-simulating devices' models. In this work, we propose a solution to fill the aforementioned gap with a lookup table (LUT)-based Verilog-A model of IHP's 1-transistor-1-RRAM (1T1R) cell. In addition, the implementation challenges of conveying the communication between the abstract ANN simulation and the circuital analysis are tackled with a design flow for resistive neural hardware accelerators that features a custom Python wrapper. As a demonstration of the proposed design flow and 1T1R model, an ANN for the MNIST handwritten digit recognition task is assessed with the last layer verified in circuit simulation. The obtained recognition confidence intervals show a considerable discrepancy between the purely application-level PyTorch simulation and the proposed design flow which spans across the abstraction layers down to the circuital analysis.
12:00	Amirhossein Parvaresh, Shima Hosseinzadeh and Dietmar Fey Resilience and Precision Assessment of Natural Language Processing Algorithms in Analog In-Memory Computing: A Hardware-Aware Study PRESENTER: Amirhossein Parvaresh ABSTRACT. Natural Language Processing (NLP) serves as a cornerstone technology, facilitating complex human-computer interactions, enabling information retrieval, conducting sentiment analysis, and enhancing language comprehension. With the ever-growing use of NLPs, the conventional 'von Neumann' computing paradigm is rapidly approaching its inherent limitations. In response, Analog In-Memory Computing (AIMC) emerges as a compelling alternative, albeit accompanied by inherent non-idealities when deploying neural networks on such platforms. In this paper, we have evaluated the precision and resilience of various NLP algorithms when executed within the AIMC framework, both with and without the application of hardware-aware training. Our analysis reveals noteworthy insights: Gated Recurrent Unit (GRU) neural networks exhibit enhanced resilience to noise, yielding an average test error of 3.97\% following hardware-aware training, as compared to their full precision counterparts. Conversely, Long Short-Term Memory (LSTM) networks demonstrate a slightly higher average test error of 5.67\%, indicating a relatively lower tolerance to non-idealities. In contrast, Convolutional Neural Networks (CNNs) manifest a heightened vulnerability, exhibiting an average relative test error of 13.34\%. Furthermore, we systematically investigate the sensitivity profiles of the selected neural networks in the presence of specific non-idealities, providing valuable insights into their robustness and susceptibility within the AIMC environment.
12:18	Shuo Ran, Bi Wu, Ke Chen and Weiqiang Liu VLCP: A High-Performance FPGA-based CNN Accelerator with Vector-level Cluster Pruning PRESENTER: Bi Wu ABSTRACT. Convolutional neural networks (CNNs) are widely used in computer vision, natural language processing, and other application scenarios. But deploying CNNs at the edge is challenging due to their large number of parameters. Pruning is a solution that can effectively reduce the number of parameters and off-chip memory accesses. However, high sparsity unstructured pruning is not hardware-friendly, while structured pruning has low compression efficiency. As a result, vector-level pruning, with a coarser granularity, is a promising alternative that balances pruning performance and hardware-friendliness. In this paper, a hardware-oriented vector-level pruning strategy is proposed based on the CNN vector distribution properties. By expanding the dynamic range of vector groups, more important weights can be preserved without sacrificing accuracy. When applied to the VGG-16 and ResNet-18 models on the ImageNet dataset, the proposed strategy achieved 10.93X and 10.17X compression ratios in convolutional layers with a 66% reduction in computation and an acceptable drop in top-1 accuracy. Furthermore, the proposed pruning scheme achieves a remarkable performance of 188 FPS on the VCU118 evaluation board, demonstrating its compatibility with hardware. Compared to the state-of-the-art, the proposed strategy reaches 69% performance improvement and up to 2.8X higher LUT efficiency.

12:36-13:30Lunch (incl. Group picture)

13:30-14:15 Session 3: Invited Speaker: Hussam Amrouch

Chair:

Fernando Corinto

Location: Dülfer Hall

13:30

Hussam Amrouch

In-Memory Computing using Ferroelectric Transistors: Lessons Learnt and Future Trends

ABSTRACT. In the burgeoning realm of artificial intelligence (AI), the pursuit of In-Memory Computing (IMC) is paramount. This relentless pursuit, aimed at catalyzing ultra-fast and energy-efficient AI computations, is emblematic of the cutting-edge innovations at the nexus of Ferroelectric FET (FeFET) technology. In this talk, we will showcase the latest advancements in FeFETs, spanning from traditional IMC-based hardware accelerators to monolithic 3D integration using advanced back-end-of-line (BEOL) thin-film transistors. We will elucidate the inherent challenges posed by ferroelectric stochasticity along with temperature effects, and demonstrate innovative strategies, such as using thermoelectric devices for advanced on-chip cooling, to mitigate their adverse impacts, paving the way for reliable computing using FeFET-based IMC.

14:15-14:45Coffee Break

14:45-16:15 Session 4: Industry Session: Panelist Talks

Chair:

Ronald Tetzlaff

Location: Dülfer Hall

16:15-16:30Coffee Break

16:30-17:30 Session 5: Industry Session: Panel Discussion

Chair:

Ronald Tetzlaff

Location: Dülfer Hall

17:30-19:00 Welcome Reception

Location: Dülfer Hall

19:30-20:30 Guided City Tour with Stollen baker Grete

Location: Kronentor (Zwinger)