View: session overviewtalk overview
Have you ever wondered what the future of the Internet will look like? Well, the folks at Meta have, and we’re fortunate to have some of their key thought leaders joining us for a Fireside Chat on Wednesday morning for the keynote session.
10:30 | IEA-Plot: Conducting Wafer-Based Data Analytics Through Chat PRESENTER: Yueling Zeng ABSTRACT. This paper presents key ideas behind IEA-Plot, a software framework designed to conduct test data analytics through chat. We use wafer-based analytics as an example application to discuss the ideas. IEA-plot interacts with a user through a dialog and produces plots according to user instructions. At the core of IEA-Plot is a knowledge graph connecting a frontend natural language parser to a backend API. This knowledge graph captures our analytics knowledge in the specific domain. Usage examples are presented based on test data collected from a recent production line. Note: The IEA (Intelligent Engineering Assistant) was proposed in ITC 2018. IEA-Plot is a re-designed of the original IEA, using the latest language technologies such as GPT3.5 and S-Bert. The backend API includes the Minions analytic approach (published in ITC 2021) to support feedforward and feedback wafer pattern based analytics (published in ITC 2022). The use of knowledge graph is essential in the current design. IEA-Plot is not just a tool, but a framework showing how an IEA can be built to leverage the power of the latest language models and at the same time, incorporate domain-specific knowledge (in our case, the knowledge about the specific test data analytics) into an AI agent. |
11:00 | Improving Efficiency and Robustness of Gaussian Process Based Outlier Detection via Ensemble Learning PRESENTER: Makoto Eiki ABSTRACT. Although automotive semiconductors must comply with the standard dynamic part average testing (DPAT) defined by the Automotive Electronics Council, it remains challenging to detect outliers that deviate from the spatial trend within a wafer. Outlier detection using Gaussian process (GP) regression has recently been proposed and outperformed DPAT. However, the detection performance degrades when faulty large-scale integrations are densely included in the regression. Furthermore, the applicable test items are limited because of the long computation time for regression. We propose an outlier detection method by applying ensemble learning to GP regression for simultaneously improving the detection performance and shortening the learning time. Experimental results on industrial production test data demonstrate that the proposed method improves the robustness against latent faulty chip detection by 15.6% while reducing the computation time by 98.6% compared with the conventional GP- based method. |
11:30 | Recognizing Wafer Map Patterns using Semi-Supervised Contrastive Learning with Optimized Latent Representation Learning and Data Augmentation PRESENTER: Zihu Wang ABSTRACT. Wafer map analysis is essential for yield improvement in semiconductor manufacturing. However, manually annotating wafer map data is expensive and time-consuming, which drives up the demand for exploring label-efficient methods for wafer analysis. This paper proposes a novel semi-supervised contrastive learning framework for wafer map pattern feature extraction and classification. Our framework uses supervised contrastive learning on small amount of labeled data to learn a better latent space representation. Furthermore, a dual-encoder latent-space model is incorporated to best optimize the simultaneous use of labeled, unlabeled data, and varying types of data augmentations for representation learning. Finally, we enrich the semantics of the learned latent representation space by introducing new "inter-wafer data augmentation". Our experiments show that our method leads by a large performance gap compared to existing wafer pattern recognition techniques, and suggests that superior accuracy may be achieved simply by semi-supervised learning without resorting to labeling-intensive supervised learning. |
10:30 | Wafer-Scale Electrical Characterization of Silicon Quantum Dots from Room to Low Temperatures PRESENTER: Francesco Lorenzelli ABSTRACT. Silicon spin qubits are one of the most promising platforms for large-scale quantum computing. In this platform, a qubit, i.e., the basic unit of quantum information, is associated with the spin of a single electron confined in a region in silicon called a quantum dot. Silicon spin qubit devices must be operated at 40mK in a 3He/4He dilution refrigerator. This requirement results in long cool-down times and increased costs, which slow down the technology development. Our research aims at developing a high-volume, room-temperature screening technique to assess quantum dot variability and select suitable candidates for mK measurements. In this paper, we present transistor measurement results of quantum dots across a 300mm wafer at temperatures ranging from 225K to 300K. We analyze the statistical distribution of transistor metrics to detect outliers across temperatures, to prevent wasting measurement time and resources at mK on known bad devices. |
11:00 | GPU-based Concurrent Static Learning PRESENTER: Huaxiao Liang ABSTRACT. Static learning is a learning algorithm for retrieving implicit logical relationships between nodes in a netlist. The learning results play an important role in improving automatic test pattern generation (ATPG), such as increasing fault coverage and reducing pattern count. In this work, we study accelerating static learning on graphics processing units (GPUs). By tailoring to the architectural features of GPUs, an algorithm of concurrent static learning is proposed. Multiple learning jobs are carried out simultaneously or concurrently in the same netlist. Moreover, the forward and backward implications of these concurrent jobs are processed as a whole, which leads to better utilization of the computing resources on GPUs. Experiments show that the algorithm can achieve up to 253x speedup against a single-threaded commercial tool and is about 1.8 times better than existing GPU-based solutions. |
11:30 | Biochip-PUF: Physically Unclonable Functions for Microfluidic Biochips PRESENTER: Navajit Singh Baban ABSTRACT. Flow-based microfluidic biochips (FMBs) have microvalves as key components. The physical characteristics of the microvalves vary instance-to-instance due to the inherent variability of numerous fabrication parameters. In this work, we leverage this unclonable, unpredictable instance-specific system behavior and propose physically unclonable functions (PUFs) for FMBs, namely Biochip-PUFs (Bio-PUFs in short). We utilize variability in the microvalve membrane deflection response associated with the actuation pressure challenge to be our Bio-PUF parameter. Based on the distributions of the parameters measured on actual FMBs, we complement our Bio-PUF measurements via simulations of the FMB’s microvalves in Comsol Multiphysics. Furthermore, we present a scheme based on the transient response of the microvalve actuation to augment the Bio-PUF authentication. The major advantage of this scheme is that we do not need any additional hardware to generate/implement the PUF module. The biochip itself can act as PUF instances while continuing to operate in normal functioning mode. |
Abstract: The session will discuss hardware, communication and societal challenges to enable an immersive and inclusive Metaverse
Talk 1: “Addressing Metaverse Hardware Reliability challenges with Silicon Lifecycle Management”, Jyotika Athavale (Synopsys, USA), Yervant Zorian (Synopsys, USA)
Talk 2: “Bridging Technology and Humanity to Cultivate an Inclusive Metaverse”, Jeewika Ranaweera (IEEE Future Directions)
Talk 3: "Successes and challenges of making the Metaverse accessible everywhere and for everyone", Nikolai Leung (Qualcomm, USA), Dr. Bojan Vrcelj (Qualcomm, USA), Dr. Imed Bouazizi (Qualcomm, USA), Dr. Peerapol Tinnakornsrisuphap (Qualcomm, USA), Dr. Prashanth Hande (Qualcomm, USA), Dr. Thomas Stockhammer (Qualcomm, Germany)
10:30 | Understanding and Improving GPUs’ Reliability Combining Beam Experiments with Fault Simulation PRESENTER: Fernando Santos ABSTRACT. Graphics Processing Units (GPUs) are essential in High-Performance Computing (HPC) and safety-critical applications. This market shift led to significant improvements in the programming frameworks and tools and concerns about their reliability. However, GPUs’ high complexity poses challenges in evaluating their reliability. We conducted the first cross-layer GPU reliability evaluation to unveil (and mitigate) GPU vulnerabilities. The proposed evaluation is achieved by comparing and combining extensive neutron beam experiments, fault simulation, and application profiling. Based on this detailed analysis, a novel methodology to estimate GPUs application FIT rate is proposed. The cross-layer evaluation enables two novel hardening solutions: (1) Reduced Precision Duplication With Comparison (RP-DWC) executes a redundant copy in reduced precision. RP-DWC delivers excellent fault coverage, up to 86%, with minimal execution time and energy consumption overheads (13% and 24%, respectively). (2) Dedicated software solutions for hardening Convolutional Neural Networks (CNNs) can correct up to 98% of the errors. |
11:00 | A Full-Stack Approach for Side-Channel Secure ML Hardware PRESENTER: Anuj Dubey ABSTRACT. Machine learning (ML) has recently emerged as an application with confidentiality requirements due to the high model development costs. The limited research on securing ML hardware makes them a lucrative target for side-channel attacks. Recent works have already shown the possibility of reverse engineering the model internals by exploiting the timing, and power side channels. Solving this problem requires analyzing the vulnerabilities in the current ML applications and developing full-stack countermeasures from the ground up —formal security proofs, design and implementation of ML-specific security primitives, practical security validations, and integration of the solution in the context of current ML frameworks. We achieve three objectives: 1) analyze the side-channel vulnerabilities in the blocks of an ML accelerator, 2) design, implement, and validate countermeasures to mitigate those vulnerabilities, and 3) add usability and flexibility to the solution–support multiple ML architectures via secure software APIs, and also tape out the final ASIC. |
11:30 | Towards Robust Deep Neural Networks against Design-time and Run-time Failures PRESENTER: Yu Li ABSTRACT. Deep Neural Networks (DNNs) have gained widespread adoption, but they also exhibit post-deployment failures that pose risks to property and life. Consequently, enhancing DNN robustness in safety-critical areas is crucial. This paper improves the robustness of DNN models against failures that may arise during either design time or run time. For design failures, we introduce TestRank, an efficient method for identifying DNN design issues. Constrained by test resources, TestRank selects high-quality test cases leveraging intrinsic and contextual attributes of test samples. HybridRepair then offers effective failure repair by selectively annotating failure regions and utilizing semi-supervised learning techniques. To counteract run-time fault injection attacks, we propose D2NN and DeepDyve, which introduce the dual modular redundancy concept to models for protection at the neuron and system levels, respectively. This delicate redundancy achieves lightweight protection for DNN-based systems. Evaluation performed on various image classification datasets demonstrates the effectiveness of our approaches. |
The test community is a diverse one, yet many of us may not be familiar with either the subtleties or the complexities of how best to work in such an environment. In this moderated panel session, we’ll hear several of our test colleagues tell their stories and give us practical suggestions on how we can be better allies, help members of under-represented groups get a better foothold, and create more welcoming and inclusive workplaces. In contrast to (often counterproductive) DEI compliance training, this session focuses on the narratives from people you know in our community and steps you can take to become a more aware coworker.
16:30 | High-Speed, Low-Storage Power and Thermal Predictions for ATPG Test Patterns PRESENTER: Yun-Feng Yang ABSTRACT. High test power causes thermal damage to chips under test. We need power and thermal analyses to ensure thermal safety of ATPG patterns. This requires long runtime and large disk storage because there are many cycles in ATPG patterns. In this paper, we propose power and thermal predictions for test applications. To save runtime, we use multiple ML models and decay surface models for power and thermal predictions, respectively. To save storage, we build features from flip-flop values, so we don’t need internal logic values from gate-level simulation. Our mean absolute percentage error (MAPE) for power prediction is less than 8%. Our mean absolute error (MAE) for thermal prediction is less than 1.2℃. We enable transient thermal analysis of long ATPG patterns, with 75X runtime speedup and 118X storage reduction. Our predictions are scalable with test speed, so they can be used to optimize test time while ensuring thermal safety. |
17:00 | Scan Cell Segmentation based on Reinforcement Learning for Power-Safe Testing of Monolithic 3D ICs PRESENTER: Shao-Chun Hung ABSTRACT. As Moore’s Law approaches its physical limits, monolithic 3D (M3D) integration offers continued power, performance, and density improvements. However, M3D integration can lead to large power supply noise (PSN) in the power distribution network due to high current demand and long conduction paths, leading to PSN-induced voltage droop problems. The PSNinduced voltage droop is more severe for at-speed delay testing than for the functional mode. Power-safe testing is therefore essential to prevent good chips from failing on the tester (i.e., yield loss). We propose a scan cell segmentation framework to reduce power consumption during scan capture. We use reinforcement learning to insert scan cell segments that can minimize switching activity without any adverse impact on test coverage. Simulation results for benchmark M3D designs highlight the effectiveness of the proposed framework. |
17:30 | Improving Productivity and Efficiency of SSD Manufacturing Self-Test Process by Learning-based Proactive Defect Prediction PRESENTER: Yunfei Gu ABSTRACT. In the recent storage market, Flash-based Solid State Drives (SSDs) have become high-performance alternatives to Hard Disk Drives (HDDs), dramatically increasing SSD shipments. To guarantee product reliability and quality to remain competitive, SSD manufacturers pay significant efforts in technology qualification and reliability design, especially in Manufacturing Self-Test (MST) processes. However, the cost of the MST process becomes more prominent as the memory density of SSD increases. In this paper, we study the MST data in over 20,000 SSDs and propose a novel and economical approach to dynamically reduce the MST overhead by proactive infant defect prediction based on Generative Adversarial Network-Attention based Spatial-Temporal Sequence-to-Sequence network (GAN-ASTSeq). It reduces the temporal cost by 80.2% (i.e., improves the efficiency by 4x) while maintaining an outstanding detection rate of defects. |
16:30 | Magnetic Coupling Based Test Development for Contact and Interconnect Defects in STT-MRAMs PRESENTER: Sicong Yuan ABSTRACT. The development of Spin-Transfer Torque Magnetic RAMs (STT-MRAMs) mass production requires high-quality test solutions. Accurate and appropriate fault modeling is crucial for the realization of such solutions. This paper targets the fault modeling and test generation for all interconnect and contact defects in STT-MRAMs and shows that using the defect injection (as linear resistors) and circuit simulation for fault modeling without incorporating the impact of magnetic coupling will result in an incomplete set of fault models; hence, not accurate enough. Magnetic coupling is an inherent property of STT-MRAM (by stray field) and may foster the occurrence of additional memory faults. Not considering magnetic coupling clearly will give rise to additional escape. The paper introduces a compact model for STT–MRAM that incorporates the intra- and inter-cell stray field, uses this model to derive the whole set of fault models for interconnect and contact defects, and finally proposes an efficient test solution. |
17:00 | Device-Aware Test for Ion Depletion Defects in RRAMs PRESENTER: Hanzhi Xun ABSTRACT. The mass production of Resistive Random Access Memory (RRAM) devices is impacted by the emergence of unique defects and faults. Hence, there is a pressing need for a comprehensive understanding of manufacturing defects and the development of high-quality RRAM tests. This paper introduces and characterizes a new defect called Ion Depletion (ID); this defect leads to a reduction in the high resistance state and not the low resistance state, while its non-linear electrical characteristics cannot be effectively described through traditional fault modeling and test approaches. To address this challenge, a Device-Aware (DA) defect model is developed, incorporated into a Verilog-A RRAM model, and calibrated using measurements to accurately describe the physical behavior. Afterward, fault analysis is performed based on the DA defect model, with simulation results demonstrating that the ID defect may sensitize undefined state faults. Finally, dedicated test and diagnosis solutions for the ID defect are proposed. |
17:30 | Analysis and Characterization of Defects in FeFETs PRESENTER: Dhruv Thapar ABSTRACT. Emerging devices are susceptible to manufacturing defects due to immature fabrication processes. Ferroelectric field-effect transistors, referred to as FeFETs, are promising emerging devices, but the impact of manufacturing imperfections on these devices has yet to be studied. Thus, we combine a technology CAD (TCAD) model with a fault-injection technique to represent fabrication defects in a FeFET. The TCAD model is calibrated against a fabricated metal-ferroelectric-metal capacitor and uses a multi-domain ferroelectric-layer structure. We address two classes of defects in the ferroelectric layer and map them to stuck-at-fault models referred to as neutral faults (SAP0) and stuck-at-plus and stuck-at-minus (SAP+ and SAP−) faults. We also develop a machine-learning framework to characterize these fault-injected FeFET devices. Our study of defects in ferroelectric FET (FeFET), which is done for the first time, and the insights gained thereof can provide valuable feedback for the fabrication and yield learning of FeFET-based circuits. |
1- Speaker: Haralampos-G. Stratigopoulos (Sorbonne Université, CNRS, LIP6)Title: Functional Safety of Spiking Neural Network VLSI Implementations
2- Speaker: Stefano De Carlo (Politecnico di Torino)
Title: Safety of AI and AI for safety: from CMOS to emerging technologies in neural networks
3- Speaker: Yervant Zorian (Synopsys)
The explosion of Artificial Intelligence applications has already touched the design and test communities and may be poised to fundamentally change the way that we test engineers approach our jobs. The panelists in this session will catalog some of the existing tasks where AI has made an impact, then speculate on others that may be next on the list. The underlying question that the discussion with the audience will seek to answer is “Will AI take my job away -or- make me better at doing my job?”
AI for the front-end (DFT, ATPG, Fault simulation) (3 x 15-minute talks)
Krishna Chakravadhanula (Cadence Design Systems)
Rahul Singhal (Synopsys)
Ron Press (Siemens)
AI for the back-end (diagnosis, debug, yield enhancement) (3 x 15-minute talks)
Narender Hanchate (Cadence Design Systems)
Guy Cortez (Synopsys)
Jayant D’Souza (Siemens)