View: session overviewtalk overview
TTEP Tutorial by Sandeep Goel (TSMC) & Yervant Zorian (Synopsys)
Advancements in process technology have enabled the creation of chips with billions of transistors, significantly enhancing power and performance for high-performance computing (HPC) and AI applications. This complexity has spurred the development of various 3D integration and packaging techniques utilizing multi-die/chiplet-based designs. Advanced 3D integration technologies allow for the construction of multi-die systems, each offering specific advantages and trade-offs in terms of performance, application, and cost. Similar to traditional chips, all 3DICs must be rigorously tested for manufacturing defects. This includes Known-Good Die (KGD) testing before stacking, Known-Good-Stack (KGS) testing after stacking, final tests, and system-level tests. Furthermore, given the complexity of the stacking process, in-silicon monitoring solutions are necessary to continuously check silicon health during in-field operation. This tutorial offers an overview of the advanced packaging technologies and explores the associated test flow challenges. An example of how the 3Dblox open standard simplifies the description of a 3D stack, enabling interoperability between EDA tools and allowing various test optimizations, is presented. Additionally, it covers various Design-for-Test (DFT) schemes, sensors/monitors and embedded test & repair solutions to facilitate efficient testing across different packaging configurations.
More Information: http://ttep.tttc-events.org/ttep/tutorials.html#tutorial7
TTEP Tutorial by Adit Singh (Auburn University)
New types of failures that evade traditional scan DFT tests are increasingly observed in advanced SoC designs. Recent reports from Google and Meta have highlighted significant levels of silent data corruption in large-scale data centers. These failures have been associated with specific processor cores in the processor networks, suggesting faulty or unstable hardware, rather than malfunction caused by random environmental noise. Although the limitations of structural scan tests have led to the growing adoption of System-Level Tests (SLT) as a final manufacturing screen in recent years, even these expensive functional tests allow significant test escapes that can cause failures during operation. We argue that timing marginalities, caused by manufacturing process variations, are a primary contributor to both SLT fallout and in-field failures. To support this claim, we first review existing scan-based timing tests, including recent developments in cell-aware and timing-aware methodologies, highlighting their capabilities and limitations. We then explain why these tests often fail to detect timing-related defects from process variation. Finally, we present research validated using production test data from Intel's 14nm FinFET technology that demonstrates how modifying voltage and timing conditions during scan and system-level testing can improve detection of circuits with marginal timing and thereby minimize failures in operation.
More Information: http://ttep.tttc-events.org/ttep/tutorials.html#tutorial8
TTEP Tutorial by Arani Sinha (Intel), Alberto Bosio (Ecole Centrale de Lyon) & Ernesto Sanchez (Politecnico di Torino)
AI applications have become extremely popular in everyday life as well as in the industry, but at the same time their complexity requires dedicated hardware accelerators deployed in cloud-based Data Centers. Recent studies by hyperscalers have revealed that Data Center hardware can experience failures leading to Silent Data Corruption (SDC). SDCs can impact AI workloads both during training and inference and, eventually, cause huge revenue loss. The tutorial will start with an introduction to Silent Data Corruption. The tutorial will then offer an overview of the landscape of artificial intelligence, focusing on basic frameworks such as Multi Layer Perceptron, Deep Neural Networks, and Transformers. The following phase of the tutorial will focus on AI architectures such as Tensor Processing Unit from Google, Gaudi architecture from Intel, and GPU architecture from Nvidia. After that, the impact of manufacturing defects on training and inference will be discussed and fault injection techniques developed for studying the impact of defects will be described. Next, developments in functional and structural testing of AI architectures will be discussed. Finally, such resilience techniques as gradient clipping, algorithmic fault tolerance, and tensor processing monitor will be described. The tutorial will end with a brief discussion on open research problems in this space.
More Information: http://ttep.tttc-events.org/ttep/tutorials.html#tutorial9
Tell your presenter to take a break
Happy lunch together
TTEP Tutorial by Debendra Das Sharma (Intel) & Yervant Zorian (Synopsys)
High-performance and power efficiency needs of emerging workloads demand on-package integration of heterogeneous processing units, memory, and electrical and optical interconnects. Applications such as artificial intelligence/machine learning, data analytics, 5G, automotive, and high-performance computing are driving these demands to meet the needs of cloud computing, intelligent edge, enterprise, client, and hand-held computing infrastructure. On-package interconnects are a critical component to deliver the power-efficient performance with the right feature set in this evolving landscape.
UCIe is an open industry standard with a fully specified stack that comprehends plug-and-play interoperability of chiplets on a package; like the seamless interoperability on board with well-established and successful off-package interconnect standards such PCI Express®, Universal Serial Bus (USB)®, and Compute Express Link (CXL)®. Recently, UCIe added significant enhancements for the test and debug infrastructure to work seamlessly across the silicon life cycle. In this tutorial, we will discuss the usages and key metrics of UCIe, both planar as well as 3D. We will delve into electrical, packaging, protocol, RAS, debug, testability, manageability, and software aspects along with the compliance and interoperability mechanisms. This will address inter die and intra die requirements. The intended audience of this tutorial are architects, SoC developers, chip designers, DFT & test engineers, researchers, and system integrators.
More Information: http://ttep.tttc-events.org/ttep/tutorials.html#tutorial10
TTEP Tutorial by Mehdi Tahoori (IMEC)
More Information: http://ttep.tttc-events.org/ttep/tutorials.html#tutorial11
TTEP Tutorial by Amit Pandey (Amazon), Karthik Natarjan (Synopsys) & Sankaran Menon (Ericsson)
Infield Test and Debug provides deeper insights into the system behavior and structural quality while the system is running in mission-mode. It provides a non-intrusive method for testing and debug of complex computer systems. This is specifically useful in mission-critical applications such as; space applications, ADAS (Advanced Driver Assistance Systems), for various industrial/robotic applications as well as virtually all real-time and data-center/AI applications. In this tutorial we will establish the motivation for Infield system Test & Debug, cover the various testing and debug techniques that are available today. We will then introduce the Infield system testing and debug mechanisms available for the closed-chassis systems using USB Type-C, PCIe and any other high-speed interfaces. We will conclude with results from an Infield system test and Debug used in real-world applications across the various industries.
More Information: http://ttep.tttc-events.org/ttep/tutorials.html#tutorial12
Tell your presenter to take a break
While the ITC community has had a long history with the topics of Machine Learning and Artificial Intelligence, the amazing recent developments in AI have surprised even us. These have led to dire projections from some corners that AI will take our jobs, but it seems much more likely that AI will become an important tool for DFT and test engineers, perhaps even marking an inflection point in the history of the profession. This panel has two main objectives: the first is to discuss the general topic of how AI can practically be incorporated into our day jobs, and the second is to give the audience some real-time experience with a lightly-trained test chatbot who will be one of the panelists (and yes, we do appreciate the irony of this chatbot taking the job of some other panelist, but that’s unavoidable for our purpose here).