VLSI-SOC 2023: 31ST IFIP/IEEE CONFERENCE ON VERY LARGE SCALE INTEGRATION
PROGRAM FOR TUESDAY, OCTOBER 17TH
Days:
previous day
next day
all days

View: session overviewtalk overview

10:00-10:15Coffee Break (Main Building Dining Room)
10:15-11:45 Session 16A: Emerging Computing Paradigms
Chairs:
Saddaf Rubab (University of Sharjah, UAE)
Nahla El-Araby (TU Wien, Vienna, Austria and Canadian International College, Cairo, Egypt, Egypt)
10:15
Rassul Bairamkulov (Ecole Polytechnique Federale de Lausanne, Switzerland)
Alessandro Tempia Calvino (Ecole Polytechnique Federale de Lausanne, Switzerland)
Giovanni De Micheli (Ecole Polytechnique Federale de Lausanne, Switzerland)
Synthesis of SFQ Circuits with Compound Gates

ABSTRACT. Rapid single-flux quantum (RSFQ) is one of the most advanced superconducting technologies with the potential to supplement or replace conventional VLSI systems. However, scaling RSFQ systems up to VLSI complexity is challenging due to fundamental differences between RSFQ and CMOS technologies. Due to the pulse-based nature of the technology, RSFQ systems require gate-level pipelining. Moreover, logic gates have an extremely limited driving capacity. Path balancing and clock distribution constitute a major overhead, often doubling the size of circuits. Gate compounding is a novel technique that substantially enriches the functionality realizable within a single clock cycle. However, standard logic synthesis tools do not support its specific synchronization constraints. In this paper, we build first a database of minimum-area compound gates covering all the Boolean functions up to 4 variables and all possible input arrival patterns. Then, we propose a technology mapping method for RSFQ circuits that exploits compound gates using the database as a cell library. We evaluate our framework over the EPFL and ISCAS benchmark circuits. Our results show, on average, a 33% lower logic depth with 24% smaller area, as compared to the state of the art.

10:33
Siyi Wang (Nanyang Technological University, Singapore)
Anupam Chattopadhyay (Nanyang Technological University, Singapore)
Reducing Depth of Quantum Adder using Ling Structure
PRESENTER: Siyi Wang

ABSTRACT. Improving the performance of quantum adder is an important technical challenge with major impact on the implementation of efficient, large-scale quantum computing. Continuing along this research direction, we propose a novel parallel-prefix quantum adder based on Ling expansion. We systematically explored classical structures for parallel-prefix adders assessing their suitability to be realized in quantum domain. Furthermore, Ling adder enforces Logical OR and large fan-out, which require innovative solutions. We addressed these challenges to realize the quantum Ling adder, which results in a T-depth of only O(log(n/2)). This represents a substantial improvement over the previous quantum adders based on parallel prefix structure, which require O(log n) T-depth. We present extensive theoretical and simulation-based studies to establish our claims.

10:51
Giovani Crasby Britton Orozco (Stmicroelectronics, TIMA laboratory, Université Grenoble Alpes, France)
Estelle Lauga-Larroze (Univ. Grenoble Alpes, CNRS, Grenoble-INP, TIMA, France)
Salvador Mir (Univ. Grenoble Alpes, CNRS, Grenoble-INP, TIMA, France)
Philippe Galy (STMicroelectronics, France)
Benjamin Dormieu (STMicroelectronics, France)
Quentin Berlingard (Univ. Grenoble Alpes, CEA, LETI, IMEP-LAHC, France)
Mickael Casse (Univ. Grenoble Alpes, CEA, LETI, France)
Noise modeling using look-up tables and DC measurements for cryogenic applications

ABSTRACT. There is today a lack of mature transistor-level compact models for the simulation of integrated circuits at cryogenic temperatures. This is particularly the case for the simulation of the noise behavior which is critical for most applications. In this paper, we aim at an efficient prediction of the white noise behavior of basic amplifying stages working at RF frequencies and cryogenic temperatures. For this, we propose the use of DC measurements that are incorporated in a LookUp Table (LUT) and fed to a mathematical noise model. We illustrate the approach for the case of a transistor in common source configuration. The results of circuit simulation of the noise parameters in the standard temperature range are very close to the estimation of the same parameters using the LUT with just DC measurements. The approach can be readily extended to the analysis of circuits with multiple components. Next, the LUT approach is used for estimating the noise parameters at cryogenic conditions, considering DC measurements that have been carried out at these temperatures. The paper illustrates the feasibility of carrying out a cryogenic design using a LUT-based approach while accurate compact models are not yet available

11:09
Omar Numan (Aalto University, Finland)
Martin Andraud (Aalto University, Finland)
Kari Halonen (Aalto University, Finland)
A Self-Calibrated Activation Neuron Topology for Efficient Resistive-Based In-Memory Computing
PRESENTER: Omar Numan

ABSTRACT. In-Memory Computing (IMC) accelerators based on resistive crossbars are emerging as a promising pathway toward improved energy efficiency in artificial neural networks. While significant research efforts are directed toward designing advanced resistive memory devices, the nonidealities associated with practical device implementation are often overlooked. Existing solutions typically compensate for these nonidealities during off-chip training, introducing additional complexities and failing to account for random errors such as noise, device failures, and cycle-to-cycle variability. To tackle this challenge, this work proposes a self-calibrated activation neuron topology that offers a fully online non-linearity compensation for IMC accelerators. The neuron merges multiply-accumulate operations with Rectified Linear Unit (ReLU) activation function in the analog domain for increased efficiency. The self-calibration is integrated into the data conversion process to minimize overheads and be fully online. The proposed activation neuron is designed and simulated using 22 nm FDSOI CMOS technology. The design demonstrates robustness across a wide temperature range (-40°C to 80°C) and under various process corners, with a maximum accuracy loss of 1 LSB for an 8-bit activation accuracy.

11:27
Jawar Singh (Indian Institute of Technology Patna, India)
Ankit Sirohi (Indian Institute of Technology Patna, India)
A Steep Slope Sub-10nm Armchair Phosphorene Nanoribbon FET with Intrinsic Cold Contact

ABSTRACT. The thermionic limit of conventional MOSFET hinders steep slope switching, requiring at least 60 mV gate voltage to modulate the current at room temperature. In this work, we adopted a multiscale simulation approach to investigate sub 10 nm gate length cold source field effect transistors (CS FET) based on edge-oxidized armchair phosphorene nanoribbons (APNR-O). We showed that the presence of narrow DOS in APNR-O could filter out the high energy electrons by bringing the subthreshold swing (SS) down to a minimum value of 41 mV/decade and an average value of 56 mV/decade for a fourdecade increase in drain current (IDS). An on current (ION ) of 1418.5 µA/µm and ION /IOff ratio of 2.1 × 106 was achieved by the proposed device. Low switching energy (PDP) and fast switching speed (τ) of ∼0.15 fJ/µm and ∼0.35 ps, respectively, were predicted in this work. These results indicate that the APNR-O-based FET can simultaneously fulfill the requirements of the International Roadmap for Devices and Systems (IRDS) both for high-performance (HP) and low-power (LP) devices in 2028. Hence, the APNR-O can be a good channel material for more than Moore devices.

10:15-11:45 Session 16B: Special Session: Advances and challenges in ferroelectric in memories and computing
Chairs:
Ian O'Connor (Lyon Institute of Nanotechnology, France)
Cédric Marchand (Ecole Centrale Lyon, Institute of nanotechnologies of Lyon, University of Lyon, France)
10:15
Shubham Kumar (IIT Kanpur, India)
PAUL R. Genssler (University of Stuttgart, Germany)
Somaya Mansour (University of Stuttgart, Germany)
Yogesh Singh Chauhan (IIT Kanpur, India)
Hussam Amrouch (Technical University of Munich (TUM), Germany)
Frontiers in AI Acceleration: From Approximate Computing to FeFET Monolithic 3D Integration
10:45
Cédric Marchand (Ecole Centrale Lyon, Institute of nanotechnologies of Lyon, University of Lyon, France)
Alban Nicolas (Ecole Centrale Lyon, Institute of nanotechnologies of Lyon, University of Lyon, France)
Paul-Antoine Matrangolo (Ecole Centrale Lyon, Institute of nanotechnologies of Lyon, University of Lyon, France)
David Navarro (Ecole Centrale Lyon, Institute of nanotechnologies of Lyon, University of Lyon, France)
Alberto Bosio (Ecole Centrale Lyon, Institute of nanotechnologies of Lyon, University of Lyon, France)
Ian O'Connor (Ecole Centrale Lyon, Institute of nanotechnologies of Lyon, University of Lyon, France)
FeFET based Logic-in-Memory design methodologies, tools and open challenges
11:15
Jean-Philippe Noel (CEA, France)
Emanuele Valea (CEA, France)
Laurent Grenouillet (CEA, France)
Bastien Chapuis (CEA, France)
Clément Fisher (CEA, France)
Arnaud Recoquillay (CEA, France)
Bastien Giraud (CEA, France)
Compute-In-Place Serial FeRAM: Enhancing Performance, Efficiency and Adaptability in Critical Embedded Systems
11:45-12:15Lunch Break (Main Building Dining Room)
13:00-14:30 Session 18A: Design for Testing and Reliability
Chairs:
Matteo Sonza Reorda (Politecnico di Torino, Italy)
Sohaib Majzoub (University of Sharjah, UAE)
13:00
Robert Alexander Limas Sierra (Politecnico di Torino, Italy)
Juan David Guerrero Balaguera (Politecnico di Torino, Italy)
Josie Esteban Rodriguez Condia (Politecnico di Torino, Italy)
Matteo Sonza Reorda (Politecnico di Torino, Italy)
Analyzing the Impact of Different Real Number Formats on the Structural Reliability of TCUs in GPUs

ABSTRACT. Graphics Processing Units boost the execution of Neural Networks by using in-chip accelerators (Tensor Core Units TCUs). Unfortunately, the most advanced semiconductor technologies are increasingly prone to fault effects during in-field operation. Hence, reliability and safety concerns arise in the safety-critical and High-Performance Computing domains. Faults may affect TCUs when processing massive amounts of data by resorting to different floating-point formats. In this scenario, the characterization of the impact of faults, when different numeric formats are adopted for TCUs, is still missed. This work, for the first time, evaluates the fault effects in TCUs and their impacts when using two real numeric formats (i.e., Floating-Point and Posit). For the experiments, we resort to an architectural description of TCU cores (PyOpenTCU) through 60 fault simulation campaigns, injecting 57,344 faults per campaign and requiring around 24 computation days. The experimental results indicate a relation between the corrupted spatial areas in the output matrices and the TCU’s scheduling policies. Moreover, the numeric analysis showed that faults in TCUs mainly affect 2 bits of the output results for both real numeric formats. The results demonstrate that Posit numeric formats are less affected by faults than Floating-Point formats by up to one order of magnitude.

13:18
Sagheer Ahmed (IIT Jammu, India)
Jayesh Ambulkar (IIT Jammu, India)
Debabrata Mondal (IIT Jammu, India)
Ambika Prasad Shah (IIT Jammu, India)
Soft Error Immune with Enhanced Critical Charge SIC14T SRAM Cell for Avionics Applications

ABSTRACT. The impact of high-energy space particles like cosmic rays and alpha particles flips the stored data in an SRAM cell. This paper proposes a highly reliable soft error immune with enhanced critical charge 14T (SIC14T) SRAM cell that is radiation-hardened by design and has an increased critical charge that can withstand both single-event upsets (SEU) and single-event multi-node upsets (SEMNU). We compare the performance of the proposed cell with that of other considered SRAM cells, such as the SRRD12T, RSP14T, SEA14T, and 6T SRAM cell which were simulated in 45-nm CMOS technology in Cadence Virtuoso with a supply voltage of 1V and 27◦C operating temperature. Both SEU and SEMNU caused at the storage node of SIC14T are successfully recovered. The proposed SRAM cell has 1.02×, 0.6×, 0.72×, and 4.64× better write stability, read access time, leakage power, and critical charge than the SRRD12T with 1.68× area overhead.

13:36
Walaa Amer (University of California Irvine, United States)
Mariam Rakka (University of California Irvine, United States)
Rachid Karami (University of California Irvine, United States)
Minjun Seo (University of California Irvine, United States)
Mazen Saghir (American University of Beirut, Lebanon)
Rouwaida Kanj (American University of Beirut, Lebanon)
Fadi Kurdahi (University of California Irvine, United States)
Hardware Implementation and Evaluation of an Information Processing Factory

ABSTRACT. The Information Processing Factory (IPF) utilizes factory management principles to tackle the complexities of integrated embedded systems, ensuring continuous safe operation and optimization at runtime. This paper presents a hardware implementation of IPF that enables dynamic task migration across system resources, ensuring reliability in the face of internal or external failures. We demonstrate the effectiveness of IPF through the efficient migration of tasks in multiprocessor SoCs using a safety-critical pacemaker application as a case study. Despite the additional software and hardware requirements, implementing IPF in a pacemaker results in comparable reliability to dual modular redundancy (DMR) with faster service resumption and improved resource utilization.

13:54
Gabriel Rutsch (Infineon Technologies AG, Germany)
Konrad Maier (Infineon Technologies AG, Germany)
Wolfgang Ecker (Infineon Technologies AG, Germany)
FPGA-implementation techniques to efficiently test application readiness of mixed-signal products

ABSTRACT. We present FPGA-implementation techniques to efficiently validate application readiness of a product for analog/mixed-signal (AMS) applications. Further, we show how proposed techniques are used on the example of a power conversion application and compare results including area utilization, timing impact and scalability at increased application complexity with state-of-the-art approaches. The open source synthesizable model generator for mixed-signal blocks \textbf{msdsl} is extended to support reconfigurable variables within a model description. Further, the control API of the open source FPGA prototyping automation \textbf{anasymod} is enhanced to allow updating these variable values on FPGA between test case execution. The end-result is a unique framework for application scenario driven product validation that leverages benchmark AMS system simulation throughput on FPGA and enables fast system property sweeping across modeling abstractions at a low overhead via a seamless integration into the existing tools.

14:12
Paolo Bernardi (Politecnico di Torino, Italy)
Giorgio Insinga (Politecnico di Torino, Italy)
Nima Kolahimahmoudi (Politecnico di Torino, Italy)
A Novel Approach to Extract Embedded Memory Design Parameter Through Irradiation Test

ABSTRACT. With the capability improvements in modern System-on-Chips (SoCs), the complexity of SoCs is increasing. Thus, manufacturers are investing heavily in designing and testing their devices. This complexity is causing a continuous expansion in the size of embedded memory structures. As a result of the shrinking dimensions of the transistors, memories are increasingly susceptible to Multiple Bit Upsets due to cosmic radiations.

Testing memories requires more details about the internal hardware configurations. However, these details are not provided to the final customer, who is left with inexplicable effects.

This paper proposes a new method to reconstruct architectural details from embedded SoC memories. This method extracts memory design parameters from Multiple Bit Upsets (MBUs) generated through a single irradiation test. The algorithm was tested on around 5,500 randomly generated memories. Each memory was injected with 100 Multiple Event Upsets (MEUs). The algorithm was set to test for each memory 20, 40, 60, 80, and 100 MEUs to validate the proposed approach. Alongside the correct memory design configuration (MDC), the algorithm found other possible MDCs. The quantity of these equivalent configurations decreased with the increment of the considered MEUs. This number decreased to an average of 2 equivalent MDCs when considering 100 MEUs.

13:00-14:30 Session 18B: Special Session: Emerging Hardware Security Methods and Tools
Chairs:
Ayesha Khalid (Queen's University Belfast, UK)
Abdulhadi Shoufan (Khalifa University, UAE)
13:00
Safiullah Khan (Queen's University Belfast, UK)
Ayesha Khalid (Queen's University Belfast, UK)
Ciara Rafferty (Queen's University Belfast, UK)
Yasir Ali Shah (Queen's University Belfast, UK)
Maire O'Neill (Queen's University Belfast, UK)
Wai Kong Lee (Gachon University, South Korea)
Seong Oun Hwang (Gachon University, South Korea)
Efficient, Error-Resistant NTT Architectures for CRYSTALS-Kyber FPGA Accelerators
13:30
Emre Koçer (Sabancı University, Turkey)
Can Ayduman (Sabancı University, Turkey)
Selim Kırbıyık (Sabancı University, Turkey)
Ahmet Can Mert (University of Technology Graz, Austria)
Erkay Savaş (Sabancı University, Turkey)
Efficient Design-Time Flexible Hardware Architecture for Accelerating Homomorphic Encryption
14:00
Gokulnath Rajendran (Nanyang Technological University, Singapore)
Furqan Zahoor (Nanyang Technological University, Singapore)
Simranjeet Singh (IIT Bombay, India)
Farhad Merchant (Newcastle University, UK)
Vikas Rana (Forschungszentrum J ̈ulich, Germany, Germany)
Anupam Chattopadhyay (Nanyang Technological University, Singapore)
PR-PUF: A Reconfigurable Strong RRAM PUF
14:30-14:45Coffee Break (Main Building Dining Room)
14:45-15:30 Session 19A: PhD Forum
Chair:
Mahmoud Ismail (American University of Sharjah, UAE)
Enas Abu Libdeh (Khalifa University, Abu Dhabi, UAE)
Leen Younes (Khalifa University, UAE)
Baker Mohammad (Khalifa University, UAE)
Mahmoud Al-Qutayri (Khalifa University, UAE)
Hani Saleh (Khalifa University, UAE)
Ibrahim Elfadel (Khalifa University, UAE)
Correlation Power Attacks on TEA

ABSTRACT. Cryptography provides a means to pro- tect information by converting it into encrypted form, but the threat of physical attacks, particularly side- channel attacks, poses a significant risk to the security of such systems. The primary aim of this work is to conduct correlation power attacks on the Tiny Encryption Algorithm (TEA), with the intention of retrieving the key. To achieve this goal, multiple attack strategies are devised, leveraging the inherent non- linearity in the TEA algorithm.

Leen Younes (Khalifa University, UAE)
Baker Mohammad (Khalifa University, UAE)
Mahmoud Al-Qutayri (Khalifa University, UAE)
Hani Saleh (Khalifa University, UAE)
Dima Kilani (University of British Columbia, Canada)
Real-Time Switched Capacitor Based Power Side-Channel Attack Detection

ABSTRACT. Side-channel attack (SCA) is regarded as a sig- nificant risk to the hardware implementation of cryptographic systems. Side-channel information, such as timing, power, and electromagnetic radiation, is leaked through the system and can be exploited for secret key extraction. The work proposes a real- time and compatible detection method for power SCAs. The technique makes use of a switched capacitor DC-DC (SC-DCDC) converter along with a lightweight artificial intelligence engine for power SCA detection. The proposed system, referred to as EoH, has the ability to perform dynamic voltage scaling and learn the behaviors of the cryptographic system to identify any potential attacks. The switching activities of the SC-DCDC converter can be viewed as a measurement of the cryptographic function. Thus, the recurrent neural network was chosen as it best processes timeseries data. The technique is system-specific, meaning that during the enrollment phase, the normal operation of the system is learned. The technique can also be expanded to include other types of SCA and is not limited to power.

Damiano Zuccala (University Grenoble Alpes, France)
Jean-Marc Daveau (STMicroelectronics, France)
Philippe Roche (STMicroelectronics, France)
Katell Morin-Allory (University Grenoble Alpes, France)
Formal Evaluation of Digital Circuits Robustness

ABSTRACT. With the complexity of digital systems rapidly increasing over time, innovative strategies are required to verify their robustness in relatively exiguous temporal scales. Thanks to the progress of calculus power and the growing availability of computational resources, automated formal methods are nowadays advanced enough to address this task, providing outcomes of high quality and precision in a feasible time. This work presents a general methodology to quantitatively define by formal verification the robustness (in both time and space) of digital circuits. The main flow concerning error injection, propagation and detection combines the efficiency of model checking and the versatility of simulation, evaluating in a new scheme the robustness level of the fault targets.

S Sivakumar (Indian Institute of Technology Guwahati, India)
John Jose (Indian Institute of Technology Guwahati, India)
Improving Lifetime and Performance of Non Volatile Memory Caches

ABSTRACT. Our conventional memory technologies like SRAM are inadequate to meet this demand for large memory on-chip and off-chip, owing to their high package density and leakage power. Emerging non-volatile technologies like Spin Transfer Torque RAM (STT-RAM), Phase Change RAM (PCRAM), and Resistive RAM ( ReRAM) are promising candidates. They have high packaging density and zero leakage power, which is desirable for realizing large memories. NVMs can be Single Level Cell (SLC) or Multi-Level Cells (MLC). One memory cell can save one bit of information in Single Level Cell STTRAM, whereas Multi-Level Cell STTRAM can store two or more bits in a single memory cell. MLC STTRAM has a higher cell density than SLC STTRAM. However, these technologies suffer the drawbacks of higher latency and low write endurance. The write endurance of a memory cell is the maximum number of writes that a memory cell can withstand before it completely wears out. The applications with non-uniform write patterns can cause some memory cells to wear out earlier than others, affecting the entire memory. Our work targets to improve the lifetime of SLC and MLC NVMs when used as the last-level cache through write distribution techniques.

Uppugunduru Anil Kumar (B V Raju Institute of Technology, Narsapur, India)
Syed Ershad Ahmed (BITS Pilani Hyderabad Campus, India)
Design of Power Efficient Approximate Square Root for Ultrasound imaging system

ABSTRACT. This paper introduces a new method for designing a square root (SQR) circuit, called the adaptive approximation approach. The approach involves utilizing a reduced-width approximate SQR circuit and a shifter to compute the SQR, while adaptively removing insignificant input bits. This adaptive operation results in minimal maximum error distances for the approximate SQR circuit. In comparison to an accurate 16-bit array SQR circuit, the proposed approximate design, which uses a 6-bit radicand, offers significant advantages. It is 4 times faster and consumes only 20.66% of the power. Furthermore, the proposed designs demonstrate superior performance compared to other approximate designs in image processing applications, specifically in envelope detection.

Nermine Edward (Canadian International College-CIC, Cairo, Egypt, Egypt)
Sahar Hamed (The Knoweldge Hub Universities, Coventry University, Cairo Branch, Egypt, Egypt)
Nahla El-Araby (TU Wien, Vienna, Austria and Canadian International College, Cairo, Egypt, Egypt)
Abdelhalim Zekry (Faculty of Engineering, Ain Shams University, Egypt, Egypt)
Memristor-Based Power Efficient SRAM Cell

ABSTRACT. Random Access Memory (RAM) cells are widely used in different digital electronic systems. Power consumption is one of the challenging factors while designing Static Random Access Memory (SRAM) cells, especially for ultra-low-power applications, such as implanted biomedical devices. The common approach for reducing power consumption in Complementary Metal Oxide Semiconductor (CMOS) SRAM cells is to scale down the supply voltage. However, this implies the scale-down of the process technology, which affects the read and write stability. This paper proposes a fast and power-efficient SRAM cell; the proposed cell design consists of 4 transistors and 4 memristors. The proposed 4T4M SRAM cell showed improved results in terms of power consumption as well as delay when compared to the 6T and 4T2M SRAM cells. All SRAM cells were modeled and simulated in Cadence Virtuoso using 130nm technology. The proposed 4T4M SRAM cell proved to be 11.56% and 62% less than 4T2M and 6T SRAM cells, respectively, in power consumption. Moreover, it showed a faster response by 10.5% and 57.5% when compared to 4T2M and 6T SRAM cells, respectively.

Abdul Rehman Aslam (Lahore University of Management Sciences, Pakistan, Pakistan)
Muhammad Awais Bin Altaf (Lahore University of Management Sciences, Pakistan, Pakistan)
Design and Analysis of an On-Chip Processor for Autism Spectrum Disorder Children Assistance Using Their Emotions

ABSTRACT. Autism Spectrum Disorder (ASD) is a “spectrum” neurological disorder causing different physical and cognitive disabilities. A major dilemma faced by ASD patients is the deregulated emotions causing certain, unpredicted, and instantaneous bursts of negative emotions. These negative emotion outbursts (NEOB) cause severe self-injuries and are a major hurdle for the treatment and rehabilitation of ASD patients. The unavailability of biomarkers for early prediction along with the life-long nature of the disorder requires life-long medical care and assistance for ASD patients. A physical or cognitive assistance system that can ease the severe difficulties faced by ASD children due to these certain NEOBs is direly required. This Ph.D. thesis has targeted the early prediction of NEOB for ASD children. Early prediction can help the parents and caregivers control and regularize their emotions. I have proposed and developed a wearable system-on-chip (SoC) based digital back-end (DBE) processor for negative emotion and NEOB prediction using Electroencephalogram (EEG) signals. A miniaturized, low-power SoC processor with a limited number of electrodes can be embedded in a headband as a patch sensor for continuous (24/7) prediction of NEOB/negative emotions. I have proposed and developed two (1st generation and 2nd generation) SoC-based DBE processors for the NEOB prediction.

Rose George Kunthara (CUSAT, India)
Rekha K James (CUSAT, India)
Fine Tuning Network Performance in Bufferless On-chip Networks

ABSTRACT. Modern data-driven applications that have huge processing requirements such as cloud computing, big data processing and high-performance computing mostly employ multi-core processors. Network-on-Chip (NoC), a packet based network, has emerged as a popular on-chip interconnect solution that can overcome scalability and bottleneck challenges of conventional bus based approach employed in Tiled Chip Multi-core Processors (TCMP). However, a larger part of on-chip area and power is due to NoC, of which input flit buffers of standard Virtual Channel (VC) router is a major contributor. Thus, NoC designs require efficacious router microarchitecture, topology, routing algorithms and power-aware designs. Bufferless NoC is an alternative design approach to repress rising area and power issues associated with traditional buffered NoC routers. CHIPPER, a popular bufferless deflection router, follows a parallel port allocation technique and uses a golden flit scheme to prioritize the flits. We have devised various novel techniques to limit unwanted flit deflections in CHIPPER based NoC for attaining better network performance with minimal hardware modification.

14:45-15:30 Session 19B: Student Forum
Chair:
Amer Zakaria (American University of Sharjah, UAE)
Esrat Khan (Khalifa University, UAE)
Amal Alhashmi (Khalifa University, UAE)
Ibrahim Elfadel (Khalifa University, UAE)
Experiments in Power Side Channel Attacks: DPA and CPA on AES-128

ABSTRACT. The paper presents a comparative analysis of two power side-channel attacks, namely differential power analysis (DPA) and correlation power analysis (CPA), applied to the Advanced Encryption Standard (AES) algorithm. The objective is to assess the efficacy of these attacks on an asynchronous-logic masked AES accelerator, specifically designed to withstand power analysis attacks. The experiments are conducted using a meticulously constructed test platform that enabled the collection of power consumption data during the encryption process. The findings reveal that both DPA and CPA attacks successfully uncover the AES key, albeit CPA exhibits slightly superior effectiveness compared to DPA. Additionally, this study explores the impact of countermeasures, such as masking and shuffling, on the effectiveness of these attacks. The results unequivocally demonstrate that the implementation of these countermeasures substantially diminishes the effectiveness of DPA and CPA attacks. Overall, this study emphasizes the significance of deploying countermeasures to mitigate power-side channel attacks, while providing valuable insights into the relative effectiveness of DPA and CPA attacks on AES-128.

Venkat Sai Arva (B V Raju Institute of Technology, India)
Anil Kumar Uppugunduru (B V Raju Institute of Technology, India)
Power Efficient Approximate Multiplier for Neural Network Applications

ABSTRACT. This paper presents a new architecture for an approximate unsigned multiplier, aiming to minimize both area utilization and power consumption while maintaining high accuracy. The architecture is divided into three regions: the least significant region (LSR), the approximate region, and the accurate region (most significant region). To improve hardware savings, the least contributing to the final result, the LSR is replaced with zeros. On the other hand, the approximate region utilizes two new approximate compressors, which are highly efficient 4:2 compressors introducing +1 and -1 errors. These compressors are carefully designed to neutralize each other, thereby mitigating the overall error introduced by the approximation. Experimental results for 8-bit multipliers demonstrate that the proposed designs outperform the existing design in terms of power, achieving an improvement of 15%. Furthermore, the proposed designs are evaluated using image processing and neural network applications, demonstrating their effectiveness in practical scenarios.

Abhinav Agarwal (Purdue University, United States)
Dr. Guoping Wang (Purdue University, United States)
GPU vs CPU Implementation of the Kernighan-Lin Partitioning Algorithm for Improved Partitioning Results

ABSTRACT. Kernighan-Lin Partitioning Algorithm divides a graph into two roughly equal-sized sets, minimizing the number of edges that connect the two. The algorithm's core idea is to iteratively exchange nodes between the two sets in order to enhance partitioning. The initial cut size, or the number of edges that cross the partition, is determined by first splitting the nodes into two sets. The process then repeatedly chooses a pair of nodes from various sets whose exchange causes the greatest reduction in cut size. Up until there is no more improvement possible, this process is repeated. This algorithm is implemented on two hardware platforms – CPU and a GPGPU. The former was implemented using C++ and latter was with CUDAC using Nvidia’s nvGRAPH Library. The complexity of the CPU implementation of the KL algorithm for VLSI partitioning is O(n3). To overcome the high degree of complexity, the different stages of KL Partitioning Algorithm can be completed in parallel on the GPU that features a tightly parallel design with thousands of smaller, more efficient cores that are intended to do many jobs at once. Hence, GPU implementation reduces the complexity to O(n).

Shanu Kumar (Birla Institute of Technology & Science Pilani, Hyderabad Campus, India)
Anil Kumar Uppugunduru (B V Raju Institute of Technology, Narsapur, India)
Syed Ershad Ahmed (Birla Institute of Technology & Science Pilani, Hyderabad Campus, India)
Highly Accurate Adaptive Approximate Divider Architecture For Error Resilient Application

ABSTRACT. This paper proposes a new restoring array divider(RAD) architecture based on the algorithm that adapts the approximate subtractor from the existing subtractors aim to reduce error complexity while maintaining accuracy within acceptable limits. The accuracy and power trade-off of proposed designs substantiated through the analysis of normalized error distance(NED) and NMED-Power product. The proposed design exhibits a remarkable 22.55% high accuracy compared to latest existing design and significant 19.04% reduction in area and 15.91% in power consumption compared to exact design. To further evaluate the quality-effort trade-off, both the proposed and existing RAD architectures are implemented in change detection applications.

Alisha P B (cochin University of Science and Technology, India)
Dr. Tripti S Warrier (Cochin University of Science and Technology, India)
Design of stochastic Magnetic Tunnel Junction with an Optimized Free Layer for Probabilistic Spin Logic

ABSTRACT. The paper investigates the performance of this Low barrier stochastic MTJ (LBM) for PSL design. The quality of p bit (basic unit of probabilistic computing) is generated in the circuit and analyzed statistically. The suggested p-bit design has resulted in an average 60% increase in generation speed. It requires 50% less low energy to generate better randomness than other spintronic counterparts. It also features a 60% area decrease compared to conventional pitdesigns. An invertible AND, OR gate was implemented using the p-bit and works exactly as predicted. The idea of probabilistic computing is naturally related to another new paradigm in machine learning, the binary stochastic neuron (BSN). Many machine learning algorithms use binary stochastic neurons (BSNs), which prompted the creation of hardware accelerators for this challenging task.

Felix Braun (Institute of Computer Technology, TU Wien, Austria)
Nahla El-Araby (TU Wien, Vienna, Austria and Canadian International College, Cairo, Egypt, Egypt)
Hardware Security against Replay Attacks at Partial Reconfiguration

ABSTRACT. The use of Dynamic Partial Reconfiguration (DPR) in Field Programmable Gate Arrays (FPGAs) brings new challenges in terms of security, particularly regarding replay attacks. Implementing proper security measures, such as nonce, timestamps, encryption is an essential to prevent replay attacks. In this work we propose a system that provides an additional security layer for FPGAs which are meant to be partially programmed. Through implementation and experiments the proposed design shows security enhancement beside simplicity, highly flexibility and customisation and scaling options.

Ahmad Mansour (American University of Sharjah, UAE)
Mostafa Nasr (American University of Sharjah, UAE)
Baraa Abed (American University of Sharjah, UAE)
Amr Abu Al Haj (American University of Sharjah, UAE)
Safwan Khan (American University of Sharjah, UAE)
Lutfi Albasha (American University of Sharjah, UAE)
Hassan Mir (American University of Sharjah, UAE)
Extended Abstract: FPGA Digitization of Radar Signals

ABSTRACT. The objective of this research is to develop a compact demodulator, digital-to-analog converter (DAC) and analog-to-digital converter (ADC) for analyzing a radar system that was previously implemented and researched for close-range surveillance applications. Previous research focused on reducing the size of said radar system while still producing viable outputs. The current methods used for analyzing a radar system require a large desk-sized demodulator, ADC, and spectrum analyzer. This research aims to replace these traditional signal processing methods with a field programmable gate array (FPGA) board.

15:30-17:00 Session 20: Hardware Security
Chairs:
Farhad Merchant (Newcastle University, UK)
Michail Maniatakos (New York University Abu Dhabi, UAE)
15:30
Shayesteh Masoumian (Delft University of Technology, Netherlands)
Roel Maes (Intrinsic ID B.V., Netherlands)
Rui Wang (Intrinsic ID B.V., Netherlands)
Karthik Keni Yerriswamy (Intrinsic ID B.V., Netherlands)
Geert-Jan Schrijen (Intrinsic ID B.V., Netherlands)
Said Hamdioui (Delft University of Technology, Netherlands)
Mottaqiallah Taouil (Delft University of Technology, Netherlands)
Modeling and Analysis of SRAM PUF Bias Patterns in 14nm and 7nm FinFET Technology Nodes

ABSTRACT. SRAM Physical Unclonable Functions (PUFs) are one of the popular forms of PUFs that can be used to generate unique identifiers and randomness for security purposes. Hence, their resilience to attacks is crucial. The probability of attacks increases when the SRAM PUF start-up values follow a predictable pattern which we refer to as bias. In this paper, we investigate the parameters impacting the SRAM PUF bias of advanced FinFET SRAM designs. In particular, we analyze the bias with respect to temperature, mismatches in the power supply network, and ramp-up time. We also consider process variation, circuit noise, and SRAM layout in our analysis. Our simulations results match with the silicon measurements. From the experiments we conclude that (i) the SRAM layout and in particular the power supply network can lead to a bias, (ii) this bias increases with temperature, and (iii) this bias increases when the supply ramp-up time decreases.

15:48
Anjum Riaz (Indian Institute of Technology Jammu, India)
Gaurav Kumar (Indian Institute of Technology Jammu, India)
Pardeep Kumar (Indian Institute of Technology Jammu, India)
Yamuna Prasad (Indian Institute of Technology Jammu, India)
Satyadev Ahlawat (Indian Institute of Technology Jammu, India)
On Protecting IJTAG using an Inherently Secure SIB
PRESENTER: Anjum Riaz

ABSTRACT. Modern VLSI circuits feature various embedded instruments that support non-functional features, e.g., test/debug, diagnosis, post silicon validation, in-field maintenance, etc. The IEEE Std. 1687 (IJTAG) facilitates efficient access to these on-chip instruments using a special scan cell known as Segment Insertion Bit (SIB). At the same time, it provides a covert channel for potential intruders to gain unauthorized access to these embedded instruments and thus extract confidential data such as secret keys, FPGA firmware, Chip ID, etc. Due to this reason, it is quite imperative to restrict access to embedded instruments. Various techniques are present in the literature for enhancing the security of IJTAG network. However, securing the test infrastructure at the cost of complex hardware resources is not always a feasible solution. In this paper, a new mechanism to secure the IJTAG network which is based on a new Inherently Secure SIB (ISSIB) is proposed. The proposed technique makes use of an LFSR that is formed using the update cell of the ISSIBs. The proposed scheme is simple to implement, highly scalable and provides high level of security against unauthorized access. In addition to that, the proposed scheme preserves the conventional IJTAG features and has negligible area overhead.

16:06
Hala Ibrahim (Computer and Systems Engineering, Ain Shams University, Cairo, Egypt, Egypt)
Haytham Azmi (Microelectronics Department, Electronics Research Institute, Cairo, Egypt, Egypt)
M. Watheq El-Kharashi (Computer and Systems Engineering, Ain Shams University, Cairo, Egypt, Egypt)
Mona Safar (Computer and Systems Engineering, Ain Shams University, Cairo, Egypt, Egypt)
Hardware Security Analysis of Arbiters: Trojan Modeling and Formal Verification

ABSTRACT. Due to the scale of modern systems, pre-silicon security has become a major concern for design and verification engineers. In this paper, we propose a formal verification framework for the verification of different arbiter circuits with different protocols and sizes using SystemVerilog Assertions (SVA). We also propose a formal way of the modeling and insertion of hardware Trojans of different trigger and payload types without applying any modifications to the Design Under Test (DUT). The obtained results show the formal analysis statistics in terms of time and memory for Trojan-free designs for a set of all-proven properties. It also shows how Trojan insertion affects the pass-fail criteria of the formal properties, where at least a single property fails due to the inserted Trojan. The proposed work can be generalized to verify the correct functionality and security of an arbiter circuit placed within any complex system.

16:24
Rupesh Karn (Khalifa University, UAE)
Kashif Nawaz (Technology Innovation Institute, UAE)
Ibrahim Elfadel (Khalifa University, UAE)
Post-Quantum, Order-Preserving Encryption for the Confidential Inference in Decision Trees: FPGA Design and Implementation
PRESENTER: Ibrahim Elfadel

ABSTRACT. One main objective of this paper is to show how to adapt the well-known, lattice-based NTRU post-quantum encryption to the confidential inference in decision trees. Another objective is to describe a resource-efficient FPGA implementation of the adapted NTRU. The typical use case of such encryption is that of two parties where one party has proprietary ownership of the decision tree model while the other party has proprietary ownership of the data. Confidential inference in decision trees can be insured using order-preserving cryptography, which has much weaker requirements and is therefore easier to implement than fully-homomorphic cryptography. Post-quantum NTRU is not order-preserving, but interestingly, it can be modified to obey the order-preserving property. We call the resulting cipher OPNTRU. Lossless compression can be applied to the ciphertext produced by OP-NTRU to facilitate its hardware acceleration. OP-NTRU has been implemented on an FPGA with the HDL code automatically compiled from the machine-learning framework. Confidential inference experiments are performed in hardware using the MNIST dataset

16:42
Lennart M. Reimann (RWTH Aachen University, Germany)
Jonathan Wiesner (RWTH Aachen University, Germany)
Dominik Sisejkovic (Corporate Research, Robert Bosch GmbH, Germany)
Farhad Merchant (Newcastle University, UK)
Rainer Leupers (RWTH Aachen University, Germany)
SoftFlow: Automated HW-SW Confidentiality Verification for Embedded Processors

ABSTRACT. Despite its ever-increasing impact, security is not considered as a design objective in commercial electronic design automation (EDA) tools. This results in vulnerabilities being overlooked during the software-hardware design process. Specifically, vulnerabilities that allow leakage of sensitive data might stay unnoticed by standard testing, as the leakage itself might not result in evident functional changes. Therefore, EDA tools are needed to elaborate the confidentiality of sensitive data during the design process. However, state-of-the-art implementations either solely consider the hardware or restrict the expressiveness of the security properties that must be proven. Consequently, more proficient tools are required to assist in the software and hardware design. To address this issue, we propose SoftFlow, an EDA tool that allows determining whether a given software exploits existing leakage paths in hardware. Based on our analysis, the leakage paths can be retained if proven not to be exploited by software. This is desirable if the removal significantly impacts the design's performance or functionality, or if the path cannot be removed as the chip is already manufactured. We demonstrate the feasibility of SoftFlow by identifying vulnerabilities in OpenSSL cryptographic C programs, and redesigning them to avoid leakage of cryptographic keys in a RISC-V architecture.