Transceiver Architectures for Future System Interconnect Demands
ABSTRACT. Future high-throughput systems will require significant increases in interconnect bandwidth. Current electrical SERDES transceivers are operating in excess of 200Gb/s for communication inside server racks and from large switch chips to optical modules, but there are open questions on how to scale to higher data rates. Longer reach signaling is possible with optical transceivers that have traditionally been pluggable modules, but are now migrating inside the package to more efficiently meet interconnect bandwidth requirements. This talk gives an overview of state-of-the-art transceiver design techniques for these interconnect systems.
Optimized GPU Accelerated Computing Using FP32 for 3D Laguerre-FDTD Method
ABSTRACT. This paper presents an efficient GPU-accelerated framework for solving the linear systems arising from the Laguerre-FDTD method using single-precision floating-point (FP32) arithmetic. The Laguerre-FDTD method transforms Maxwell’s equations into a sequence of sparse linear systems, which are computationally demanding to solve using a direct solver. To address this challenge, we investigate the performance and accuracy trade-offs of implementing an optimized biconjugate gradient stabilized solver (BiCGSTAB) in FP32 on modern CUDA-enabled GPUs. Numerical results show that FP32 computation on GPUs achieves substantial speedup compared to double-precision (FP64), with minimal degradation in accuracy for high-frequency electromagnetic applications
A Multiresolution Preconditioner for the Electromagnetic Analysis of Large Interconnects
ABSTRACT. This work proposes a multiresolution preconditioner for the electromagnetic characterization of electrical interconnects and packages.
The multiresolution preconditioner is selected due to its stability with respect to frequency and mesh density, as well as its suitability for parallelization.
A lumped port model for this preconditioner is devised and validated to enable the extraction of scattering parameters.
Execution times are compared to the augmented electric field integral equation, a popular formulation stabilized at low frequencies with a factorization-based preconditioner.
Trainable Activation Functions with Applications in the Design of 3D Electromagnetic Structures
ABSTRACT. In 3D EM structure design, predicting high-dimensional frequency responses from low-dimensional parameters requires up-sampling. Traditional deep transposed convolutional neural network solutions tend to suffer from gradient vanishing during training. We propose trainable Sigmoid and Tanh activation functions to address this. Applied to an on-chip microstrip line, our method improves prediction accuracy by 25.6% with similar training time.
Broadband Impedance Response Extraction of On-Chip Interdigital Capacitors using a 3-D DSA Operator for Piecewise Homogeneous Structures
ABSTRACT. In this contribution, an enhanced 3-D differential surface admittance operator is proposed, facilitating accurate modeling of piecewise homogeneous cuboidal objects. By exploiting the analytical properties of entire-domain basis functions, material interfaces are effectively eliminated from the formulation, leading to a reduction in the number of unknowns without compromising the accuracy of the operator. After a validation of the novel approach, its effectiveness is demonstrated through the analysis of the impedance responses of on-chip interdigital capacitor structures.
Efficient Hierarchical Skeleton-Based Low Rank Decomposition Technique for Integral Equations
ABSTRACT. An efficient kernel independent methodology to accelerate matrix multiplication in integral equation solver is presented. The proposed methodology builds on the idea of nested adaptive skeletonization and employs further compression using merged interaction list and adaptive cross approximation to demonstrate superior performance over a traditional low rank decomposition technique while preserving accuracy.
Tensor Train Accelerated Solution of Volume Integral Equation for 2D Magneto-Quasi-Static Characterization of Multiconductor Transmission Lines with Logarithmic Complexity
ABSTRACT. We introduce a computational framework for solving magneto‑quasistatic problems on arbitrarily shaped objects with polylogarithmic O(log^p N) complexity in CPU time and memory, where N denotes the number of basis and testing functions in the Method of Moments (MoM) discretization of the relevant volume integral equations (VIEs). This significant reduction in computational cost and storage is achieved by applying tensor‑train (TT) decompositions to the MoM matrices and vectors, coupled with tailored linear‑algebra routines operating directly on these tensorized representations. To construct the TT format, we first organize MoM operators and fields into multidimensional arrays by recursively partitioning the computational domain into a regular grid of square voxels and mapping basis functions accordingly. Although such hierarchical representations alone generally yield TT‑core ranks growing as O(N) for most nontrivial geometries, we demonstrate that applying a global Gaussian smoothing to the discontinuous material‑contrast profile confines core ranks to polylogarithmic growth O(log^p N). Numerical experiments on TT‑accelerated MoM solutions of both full‑wave and quasi‑magnetostatic VIEs confirm that our approach attains overall polylogarithmic scaling in N for objects and material distributions ranging from simple shapes to complex fractal geometries. This advancement paves the way for efficient, TT-based MoM solvers in practical electromagnetic analysis applications.
Bandit Learning-Driven Power Gating with State Retention for High Performance Computing
ABSTRACT. In sub-10nm nodes, leakage power continues to be a critical concern, especially in high-performance computing (HPC) and advanced 2.5D/3D integration, where traditional power gating techniques struggle with adaptability and responsivness. A multi-mode state-retentive power gating architecture integrated with an online bandit learning algorithm is proposed in this paper for real-time Sleep mode selection under various workload patterns. Unlike heuristics or pre-trained models, the proposed multi-armed bandit (MAB) controller adapts on-the-fly to stochastic idle distributions resulting from dynamic memory access patterns and decentralized dynamic voltage and frequency scaling (DVFS) operations. Implemented in 65 nm CMOS technology node and validated in Cadence Virtuoso on SRAM logic, the design demonstrates leakage reduction of 92.5%–97.9% compared to the system without power gating and up to 83.5% energy delay product (EDP) savings compared to state-of-the-art (SOTA) control approaches. The approach is robust against process, voltage, and temperature (PVT) variations and offers a lightweight, scalable solution for HPC workloads.
Technology Demonstrator for a Glass-Based Quantum System
ABSTRACT. Miniaturization of quantum systems demand the integration of components into compact, robust systems that support scalability and reproducibility. This paper presents a glass-based technology demonstrator incorporating an atom chip (a microdevice for trapping and controlling ultracold atoms for interferometry) developed using Borofloat® 33 glass. The system features a multi-layer architecture assembled via an adhesive-free glass-to-glass laser welding process, ensuring mechanical stability. Advanced laser-based structuring techniques (LIDE and SLE), combined with electroplating and planarization, enable precise copper conductor integration. The demonstrator supports continuous currents up to 2 A, offers optical transparency for laser access in the desired wavelength range and maintains structural integrity under mechanical stress. This work establishes a foundation for scalable quantum system packaging and outlines future improvements in transparency, stability, and component integration for high-precision quantum applications.
Temperature Effects on Cryogenic Test Fixture Interconnects
ABSTRACT. Measurements of RF and microwave devices at cryogenic temperatures require extra interconnect fixturing, often in the form of low loss coaxial cables. These interconnects traverse a range of temperatures from room temperature at approximately 300 K down to base temperature 4.2 K in liquid helium. Consequently, these interconnects require time to stabilize in both temperature and electrical performance. This paper characterizes the time- and temperature-dependent effects on the electrical performance of the fixture interconnects and delineates the effect of the interconnects thermalizing from the device under test.
On-Chip Passive Dispersive Delay Lines for RF Chirp Pulse Shaping
ABSTRACT. This paper presents the design, simulation and measurement of on-chip passive linear dispersive delay lines. The dispersive lines are synthesized in terms of a cascade of customized bridged T-coil (BTC) cells, each with an appropriate peaked group delay response. To demonstrate the effectiveness of the design approach, dispersive delay lines with both positive and negative linear slope have been designed in the Tower Semiconductor 0.18 μm SiGe BiCMOS process and characterized by full-wave electromagnetic (EM) simulation. A 5-cell cascaded BTC dispersive delay line with negative slope has also been fabricated and measured, showing a linear group delay response from 165 ps to 65 ps over the 12 GHz to 30 GHz frequency range with an insertion loss below 5.7 dB and return loss better than 15.5 dB. Time-domain simulations using the measured S-parameters show a 100 ps chirp pulse compression over a 1 ns input chirp pulse width, demonstrating the viability of this on-chip dispersive line design approach for analog signal processing.
On-Chip Tunable Delay Line Using Complementary Bridged T-Coils with Inductor-Capacitor Tuning
ABSTRACT. This paper presents the design of a fully tunable passive delay line incorporating both inductor and capacitor tuning mechanisms in a complementary bridged T-coil (BTC) configuration. Particular emphasis is placed on the layout implementation of a switchable inductor topology integrated into the complementary BTC. The proposed tunable delay line is demonstrated in a design in the Tower Semiconductor 180 nm SiGe BiCMOS technology for two switched group delay states of 150 ps and 200 ps, respectively across the Ku-band (12–18 GHz) using only three cascaded complementary BTC cells. Full-wave electromagnetic simulations show a maximum delay variation of 7% with insertion loss below 9 dB and return loss above 12 dB for both delay states. The proposed approach enables the realization of compact on-chip tunable delay lines without compromising impedance matching or bandwidth.
Impulse-Response De-Embedding Correction For Non-Identical Fixturing
ABSTRACT. De-embedding is an approach for removing the
fixture electrical effects from a device under test within the fixture.
The process uses two S-parameter measurements, first on an
isolated fixture and secondly on the device within a fixture. The
same exact fixture often cannot be used for these two
measurements. Our approach assumes that the measurements
were made in two similar-but-different fixtures and uses impulse
responses to devise a fixture model to compensate for the
differences often observed. We demonstrate that our approach
improves de-embedding accuracy when using similar but not
identical fixtures