View: session overviewtalk overview
10:15 | A Deterministic Parallel Routing Approach for Accelerating Pathfinder-based Algorithms PRESENTER: Umair F. Siddiqi ABSTRACT. Routing is a time-consuming task in the FPGA design flow, and its task is to build non-overlapping routing trees for all nets. PathFinder is a popular routing algorithm, and it is implemented in the versatile-place-and-route (VPR) tool. The latest version of PathFinder, implemented in VPR 8.0, employs incremental routing in which it rip-up and re-route (RnR) only those branches of the routing trees that have a congested node or their delay has degraded significantly in the last iterations. The initial iterations have a very high workload (i.e., the number of branches to route), and the later ones have fewer branches to build. We propose a parallel-sequential hybrid router for PathFinder with incremental routing that applies deterministic parallel routing to a window of initial iterations having a high routing workload and sequential routing to the remaining iterations. It also uses an intelligent approach to select nets for sequential and parallel routing. Experiments conducted using Titan benchmarks show that it can improve the runtime of PathFinder by upto 32% with no significant degradation in solution quality. |
10:33 | ABSTRACT. Efficient target localization is a requirement and challenge in various applications such as autonomous vehicle systems, defence, and indoor positioning. Radars and lidars are used in such applications where the range-doppler matrix representations are common and found to be very useful. In this paper, we address the problem of target localization from a range-doppler two-dimensional (2D) matrix in latency-sensitive radar systems which support hardware acceleration. The proposed algorithms consists of three key steps, namely (a) initial background noise filtering and binarization, (b) artifact filtering, and (c) 2D target localization. We use K-means clustering and morphology to filter background noise and artifacts respectively. To implement the 2D target localization effectively, we propose a novel two-layer connected components algorithm, which reduces the computations and hence latency in target localization compared to the conventional connected components algorithm up to 64.37\%. Simulation results are provided to demonstrate the performance of the algorithm in the presence of additive noise, which is compared with existing methods. |
10:51 | Towards Robust Process Design Kits with a Scalable DevOps Quality Assurance Platform ABSTRACT. Process design kits (PDK) and their robustness verification is pivotal to a semiconductor foundry’s growth and customer retention. This paper presents an automated PDK quality assurance (QA) platform that is based on a continuous integration and continuous delivery tool. The tool helps to keep a PDK at the production quality level guaranteeing its deployment at any time. The introduced methodology allows detecting and resolving problems at earlier stages, while significantly reducing the time required for a pre-release PDK verification. The QA platform was embedded into a PDK verification flow for 0.13 um and 0.25 um SiGe BiCMOS technologies resulting in reliable PDK releases on demand. Moreover, we utilize the proposed PDK QA platform to perform verification of the interoperable PDK while using a formerly released PDK as a reference. |
11:09 | A 16-bit Floating-Point Near-SRAM Architecture for Low-power Sparse Matrix-Vector Multiplication ABSTRACT. State-of-the-art Artificial Intelligence (AI) algorithms, such as graph neural networks and recommendation systems, require floating-point computation of very large matrix multiplications over sparse data. Their execution in resource-constrained scenarios, like edge AI systems, requires a) careful optimization of computing patterns, leveraging sparsity as an opportunity to lower computational requirements, and b) using dedicated hardware. In this paper, we introduce a novel near-memory floating-point computing architecture dedicated to the parallel processing of sparse matrix-vector multiplication (SpMV). This architecture can be integrated at the periphery of memory arrays to exploit the inherent parallelism of memory structures to speed up computation. In addition, it uses its proximity to memory to achieve high computational capability and very low latency. The illustrated implementation, operating at 1GHz, can compute up to 370 MFLOPS (millions of floating-point operations per second) while computing SpMV multiplications, while incurring a modest 17% area overhead when interfaced with a 4KB SRAM array. |
11:27 | Reconfigurable Rectifier for RF Energy Harvesting System at WiFi-6 Frequency Band for 2.5 V ABSTRACT. In this paper, a novel RF energy harvesting system is presented, designed to operate within the WiFi-6 frequency band. The system incorporates reconfigurable rectifier stages, offering flexibility and adaptability. Along with the rectifier stages, it includes a comparator, a ring counter, and an LDO. The system is implemented using TSMC 180 nm technology at 2.4 GHz and maintains a steady output voltage of 2.3 V. In this design, the rectifier stages are precisely adjusted to ensure that the input voltage for the LDO hovers around 2.5 V, accounting for a 200 mV dropout voltage. This adjustment significantly enhances the efficiency of the LDO. Additionally, the rectifier’s conversion ratio is configured to compensate for any impedance mismatch caused by variations in the input power. The impedance of the rectifier is derived through a large signal SP analysis, allowing for the determination of appropriate impedance matching network values. As a result, the system achieves a sensitivity of -14 dBm while maintaining an output voltage of 2.3 V. |
Said Hamdioui (Delft University of Technology, Netherlands)
10:15 | Resistive RAM for computing: How reliable ? |