Program for Monday, October 16th

PROGRAM FOR MONDAY, OCTOBER 16TH

Days:

previous day

next day

all days

View: session overview talk overview

08:00-09:00 Session 7: Registration

Location: Main Building Rotunda

09:00-10:00 Session 8: Opening Ceremony and First Keynote: Dr. Chris Clifton, Sony Semiconductors, UK

Chair:

Lutfi Albasha (American University of Sharjah, UAE)

Location: Main Building Hall A

10:00-10:15Coffee Break (Main Building Dining Room)

10:15-11:45 Session 9: VLSI Designs for AI Acceleration

Chairs:

Abir Hussain (University of Sharjah, UAE)
Hussam Amrouch (Technical University of Munich (TUM), Germany)

Location: Main Building Hall A

10:15	Yi Chen (RWTH Aachen University, Germany) Jie Lou (RWTH Aachen University, Germany) Christian Lanius (RWTH Aachen University, Germany) Florian Freye (RWTH Aachen University, Germany) Johnson Loh (RWTH Aachen University, Germany) Tobias Gemmeke (RWTH Aachen University, Germany) An Energy-Efficient and Area-Efficient Depthwise Separable Convolution Accelerator with Minimal On-Chip Memory Access PRESENTER: Yi Chen ABSTRACT. In this paper, we present a hardware accelerator for DSC that enables 100% utilization of the processing element (PE) array for depthwise convolution (DWC) and up to 98% utilization for pointwise convolution (PWC), while also reducing latency. By partitioning the input feature map (ifmap) SRAM of the DWC into three banks, we minimize memory access and maximize data reuse. The input activations and weights only need to be loaded once from SRAM to PE for both DWC and PWC. Additionally, to support efficient operations across different layers, we present a layerwise matching method. The proposed DSC accelerator is implemented in 22nm FDSOI technology and validated using MobileNetV1 on the CIFAR10 dataset. The post-layout results demonstrate that the proposed accelerator can operate at 1GHz and achieve an energy efficiency of 5.07 (3.96) TOPS/W and an area efficiency of 519.2 (461.52) GOPS/mm2 for DWC (PWC) at 0.8V. After scaling the supply voltage down to 0.5V, the energy efficiency for the proposed accelerator increases to 13.64 TOPS/W for DWC and 10.64 TOPS/W for PWC, respectively.
10:33	Rebecca Pelke (Institute for Communication Technologies and Embedded Systems (ICE), RWTH Aachen, Germany) Nils Bosbach (Institute for Communication Technologies and Embedded Systems (ICE), RWTH Aachen, Germany) Jose Cubero (Institute for Communication Technologies and Embedded Systems (ICE), RWTH Aachen, Germany) Felix Staudigl (Institute for Communication Technologies and Embedded Systems (ICE), RWTH Aachen, Germany) Rainer Leupers (Institute for Communication Technologies and Embedded Systems (ICE), RWTH Aachen, Germany) Jan Moritz Joseph (Institute for Communication Technologies and Embedded Systems (ICE), RWTH Aachen, Germany) Mapping of CNNs on multi-core RRAM-based CIM architectures ABSTRACT. RRAM-based multi-core systems improve the energy efficiency and performance of CNNs. Thereby, the distributed parallel execution of convolutional layers causes critical data dependencies that limit the potential speedup. This paper presents synchronization techniques for parallel inference of convolutional layers on RRAM-based CIM architectures. We propose an architecture optimization that enables efficient data exchange and discuss the impact of different architecture setups on the performance. The corresponding compiler algorithms are optimized for high speedup and low memory consumption during CNN inference. We achieve more than 99% of the theoretical acceleration limit with a marginal data transmission overhead of less than 4% for state-of-the-art CNN benchmarks.
10:51	Jingbo Jiang (The Hong Kong University of Science and Technology, Hong Kong) Xizi Chen (Huazhong Agricultural University, China) Chi-Ying Tsui (The Hong Kong University of Science and Technology, Hong Kong) Accelerating Large Kernel Convolutions with Nested Winograd Transformation ABSTRACT. Recent literature has shown that convolutional neural networks (CNNs) with large kernels outperform vision transformers (ViTs) and CNNs with stacked small kernels in many computer vision tasks, such as object detection and image restoration. The Winograd transformation helps reduce the number of repetitive multiplications in convolution and is widely supported by many commercial AI processors. Researchers have proposed accelerating large kernel convolutions by linearly decomposing them into many small kernel convolutions and then sequentially accelerating each small kernel convolution with the Winograd algorithm. This work proposes a nested Winograd algorithm that iteratively decomposes a large kernel convolution into small kernel convolutions and proves it to be more effective than the linear decomposition Winograd transformation algorithm. Experiments show that compared to the linear decomposition Winograd algorithm, the proposed algorithm reduces the total number of multiplications by 1.4 to 10.5 times for computing 4×4 to 31×31 convolutions.
11:09	Imlijungla Longchar (Indian Institute of Technology Guwahati, India) Hemangee Kapoor (Indian Institute of Technology Guwahati, India) ADaMaT: Towards an Adaptive Dataflow for Maximising Throughput in Neural Network Inference ABSTRACT. With the development of research in hardware for Convolutional Neural Network(CNNs) Algorithms, it becomes crucial to examine the different aspects of hardware design. CNNs are mainly used in computer vision applications, and translating these algorithms into hardware calls for adopting appropriate dataflow to improve the utilisation of hardware resources resulting in higher throughput. In particular, the inference task at each neuron position can be assigned to a compute unit in the hardware accelerator, and several such neuron positions can be completed in parallel. We observe that adopting a static dataflow for an architecture can result in the under-utilisation of resources because of the different dimensions of the data in the network. The motivation of this paper is built upon the need for adaptive dataflow for the design to improve the multiply-and-accumulate (MAC) utilisation in CNNs. We propose a method, ADaMaT, which adapts the dataflow at runtime by appropriately assigning tasks to the MAC units depending on the dimensions of the layers instead of a pre-determined assignment. The adaptive assignment tries to maximise the MAC utilisation and improve the throughput. We have performed a comparative analysis among different static dataflows and our proposed ADaMaT dataflow.
11:27	Hanning Chen (Univeristy of California, Irvine, United States) Yeseong Kim (Daegu Gyeongbuk Institute of Science and Technology, South Korea) Elaheh Sadredini (University of California, Riverside, United States) Saransh Gupta (University of California San Diego, United States) Hugo Latapie (Cisco Systems, United States) Mohsen Imani (Univeristy of California, Irvine, United States) Sparsity Controllable Hyperdimensional Computing for Genome Sequence Matching Acceleration ABSTRACT. In this paper, we propose a Hyper-Dimensional genome analysis platform. Instead of working with original sequences, our method maps the genome sequences into high-dimensional space and performs sequence matching with simple and parallel similarity searches. At the algorithm level, we revisit the sequence searching with brain- like memorization that Hyper-Dimensional computing natively supports. Instead of working on the original data, we map all data points into high-dimensional space, enabling the main sequence searching operations to process in a hardware-friendly way. We accordingly design a density-aware FPGA implementation. Our solution searches the similarity of an encoded query and large-scale genome library through different chunks. We exploit the holographic representation of patterns to stop search operations on libraries with a lower chance of a match. This translates our computation from dense to highly sparse just after a few chuck-based searches. Our evaluation shows that our accelerator can provide 46× speedup and 188× energy efficiency improvement compared to a state-of-the-art GPU implementation. Results show that our accelerator achieves up to 3440.6 GCUPS using a single Xilinx Alveo U280 board.

11:45-12:15Lunch Break (Main Building Dining Room)

12:15-13:00 Session 10: Luncheon Keynote: Mr. Ahmed Shafqat, Rhode & Schwarz, Germany

Chair:

Nasser Qaddoumi (American University of Sharjah, UAE)

Location: Main Building Dining Room

13:00-14:30 Session 11A: Digital Circuits and System on Chip

Chairs:

Thanos Stouraitis (Khalifa Univ, UAE)
Chun-Jen Tsai (National Yang Ming Chiao Tung University, Taiwan)

Location: Main Building Hall A

13:00	Deepraj Soni (New York University, United States) Mohammed Nabeel (New York University Abu Dhabi, UAE) Ramesh Karri (New York University, United States) Michail Maniatakos (New York University Abu Dhabi, UAE) Optimizing Constrained-Modulus Barrett Multiplier for Power and Flexibility ABSTRACT. Fully Homomorphic Encryption (FHE) promises data protection by computing on encrypted data, but demands resource-intensive computation. FHE hardware accelerators, which improve FHE scheme performance with densely packed computing units, could potentially damage the chip with excessive heat dissipation because of high power consumption. Therefore, it is necessary to reduce the power consumption of the accelerator and its most critical module, i.e., modular multiplier. In this work, we extend the idea of the allowing a specific form of modulus to achieve a low-power modular multiplier. The proposed work can reduce power consumption by 15% and area by 20%. We also propose an approximation for the number of moduli available with the discussed constraints on the modulus form.
13:18	Klajd Zyla (Technical University of Munich, Germany) Marco Liess (Technical University of Munich, Germany) Thomas Wild (Technical University of Munich, Germany) Andreas Herkersdorf (Technical University of Munich, Germany) FlexPipe: Fast, Flexible and Scalable Packet Processing for High-Performance SmartNICs PRESENTER: Klajd Zyla ABSTRACT. Data centers have been struggling to provide the necessary processing capacity to handle the surging rate of network traffic that is generated in an increasingly connected and service-oriented world. As a result, SmartNICs play an even more important role than before as they can offload various network applications and hence free CPU resources for application-layer processing, increase performance and reduce processing time. However, they often do not support flows with different offload requirements and cannot dynamically allocate offloads in runtime. In order to address these limitations, we propose FlexPipe, a fast, flexible and scalable packet-processing architecture for high-performance SmartNICs. Our design enables low-latency and runtime-reconfigurable packet forwarding at high traffic rates with minimal area overhead. Furthermore, it provides load-aware packet steering toward multiple offload units of the same type for low-bandwidth offloads. We implement a prototype of FlexPipe in Verilog and validate it via cycle-accurate register-transfer level simulations. Our evaluation results show that FlexPipe can process packets of arbitrary sizes with different offload requirements at line rate and on average 1.9x faster than a SmartNIC with a predefined sequence of offloads and 1.8x faster than PANIC, a flexible state-of-the-art SmartNIC.
13:36	Rafael Medina Morillas (Embedded Systems Laboratory (ESL), EPFL, Switzerland) Darong Huang (Embedded Systems Laboratory (ESL), EPFL, Switzerland) Giovanni Ansaloni (Embedded Systems Laboratory (ESL), EPFL, Switzerland) Marina Zapater (University of Applied Sciences Western Switzerland (HES-SO), Switzerland) David Atienza (Embedded Systems Laboratory, STI, EPFL, Switzerland) REMOTE: Re-thinking Task Mapping on Wireless 2.5D Systems-on-Package for Hotspot Removal ABSTRACT. 2.5D Systems-on-Package (SoPs) are composed by several chiplets placed on an interposer. They are becoming increasingly popular as they enable easy integration of electronic components in the same package and high fabrication yields. Nevertheless, they introduce a new bottleneck in inter-chiplet communication, which must be routed through the interposer. Such a constraint favors mapping related tasks on computing cores within the same chiplet, leading to thermal hotspots. In-package wireless technology holds promise to reconsider such a position because integrated wireless antennas provide low-latency and high-bandwidth communication paths, thus bypassing the interposer bottleneck. Furthermore, in this work, we propose a new task mapping heuristic that leverages in-package wireless technology to improve the thermal behavior of 2.5D SoPs executing complex applications. Combining system simulation and thermal modeling, our results show that we can distribute computation in wireless 2.5D SoPs to reduce peak temperatures by up to 24% through task mapping with a negligible performance impact.
13:54	Hossein Taji (Embedded System Laboratory (ESL), EPFL (École Polytechnique Fédérale de Lausanne), Switzerland) José Miranda (Embedded System Laboratory (ESL), EPFL (École Polytechnique Fédérale de Lausanne), Switzerland) Miguel Peon Quiros (EcoCloud, EPFL (École Polytechnique Fédérale de Lausanne), Switzerland) Szabolcs Balasi (Nespresso Systems R&D Specialist - Electronics, Switzerland) David Atienza (Embedded System Laboratory (ESL), EPFL (École Polytechnique Fédérale de Lausanne), Switzerland) Dynamic Scheduling for Event-Driven Embedded Industrial Applications ABSTRACT. This paper addresses the optimization of embedded platforms to meet the computing and real-time requirements of cyber-physical systems and IoT applications, including embedded intelligence. In this context, schedulers are vital in enhancing processor utilization in industrial contexts. Although existing research has focused primarily on the schedulability of periodic tasks, event-driven tasks better represent these new embedded intelligence scenarios in the real world. This work explores static and dynamic scheduling policies within a general scenario and a specific case study based on an actual industrial application. The proposed dynamic scheduler has been integrated into the FreeRTOS kernel and has been employed to conduct all of our experiments on industrial products within the smart home domain. Our results show that, while we can respect real-time requirements, our proposed dynamic scheduling can improve the performance of event-driven applications by reducing missed task deadlines by up to 60%. Moreover, we have also developed a lightweight version of our dynamic scheduler for industrial products that reduces average timing overhead for task selection and insertion by up to 34.7% and memory overhead for task creation and list scheduling by up to 74.7% compared to state-of-the-art static alternatives.
14:12	Aishwarya Gupta (Indian Institute of Technology Guwahati, India) Aswathy N S (Indian Institute of Technology Guwahati, India) Hemangee K. Kapoor (Indian Institute of Technology Guwahati, India) Look before you leap: An Access-based Prudent Page Migration for Hybrid Memories ABSTRACT. Hybrid memory composed of DRAM and PCM exploits the benefit of both type of memories. The random page placement in such memories may cause write-intensive pages to be placed in PCM partition which may adversely affect the memory performance due to the higher write latency of PCM. Migration of write-intensive pages to DRAM helps in improving memory service time. Existing techniques migrate pages having write access count greater than a predefined threshold. These techniques do not examine the access pattern once the choice to migrate the page has been made, which might lead to unnecessary migrations because the page may have been hot before the decision, but the number of access may have dropped after migration. To accurately identify the hot page, we propose an access-based prudent page migration method which uses an eDRAM buffer to migrate hot pages from PCM to DRAM. In this paper, we present a look-before-you-leap migration technique where after a page is identified as a hot page, makes a thoughtful decision regarding whether to migrate or not to migrate it.

13:00-14:30 Session 11B: Special Session: Memory-based computing for energy efficient AI

Chairs:

Ismail Shahin (University of Sharjah, UAE)
Mohsen Imani (Univeristy of California, Irvine, United States)

Location: Main Building Hall B

13:00

Foroozan Karimzadeh (Georgia Institute of Technology, United States)
Mohsen Imani (University of California Irvine, United States)
Bahar Asgari (University of Maryland, United States)
Ningyuan Cao (University of Notre Dame, United States)
Yingyan Lin (Georgia Institute of Technology, United States)
Yan Fang (Kennesaw State University, United States)

Memory-based computing for energy-efficient AI: Grand challenges

14:30-14:45Coffee Break (Main Building Dining Room)

14:45-15:30 Session 12: Poster Session

Chair:

Hasan Al-Nashash (American University of Sharjah, UAE)

Location: Engineering & Science Building Rotunda

Esha Sarkar (New York University Tandon School of Engineering, UAE)
Constantine Doumanidis (New York University Abu Dhabi, UAE)
Michail Maniatakos (New York University Abu Dhabi, UAE)

TRAPDOOR: Repurposing neural network backdoors to detect dataset bias in machine learning-based genomic analysis

ABSTRACT. Use of Machine Learning (ML) to understand underlying patterns in gene mutations (genomics) has far-reaching results in diagnosis and treatment for life-threatening diseases like cancer. Success and sustainability of ML algorithms depends on the quality and diversity of training data and under-representation of groups (gender, race, etc.) can lead to exacerbation of systemic discrimination issues. In this work, we propose TRAPDOOR, a methodology for the identification of biased datasets by repurposing, otherwise malicious, neural backdoors. Our methodology can leak potential bias information about the cloud's dataset which is collected in a collaborative setting, without hampering the genuine performance. Using a real-world cancer genomics dataset, we analyze feasibility of leaking bias for gender and race attributes. Our experimental results show that TRAPDOOR can detect the presence of dataset bias with 100% accuracy, and furthermore can also extract the extent of bias by recovering the percentage with a small error.

Sejin Lim (Hansung University, South Korea)
Hyunjun Kim (Hansung University, South Korea)
Kyungbae Jang (Hansung University, South Korea)
Siyi Wang (Nanyang Technological University, Singapore)
Anubhab Baksi (Nanyang Technological University, Singapore)
Anupam Chattopadhyay (Nanyang Technological University, Singapore)
Hwajeong Seo (Hansung University, South Korea)

Optimized Quantum Circuit Implementation of Payoff Function

PRESENTER: Siyi Wang

ABSTRACT. Large-scale quantum computers that can execute practical quantum algorithms have the potential to solve complex problems that are currently challenging for classical computers. This involves converting these problems into a form that can be processed by quantum circuits, a crucial process that requires minimizing quantum resources like qubit count, gate count, and circuit depth. Our work focuses on implementing and optimizing the function ${f_K(S) = \max(S-K,0)}$, which is a fundamental task in quantum finance known as option pricing, using quantum circuits. Taking into consideration the significant trade-offs between qubit count and circuit depth, we have developed quantum circuits for the optimized implementation of the $f_K(S)$. Our work incorporates various optimization techniques for the circuit, such as selecting the optimal adder, optimizing the $S-K$ operation, parallelization, and qubit reuse. Furthermore, we offer various versions of our quantum circuits for the $f_K(S)$, each featuring different adders and Toffoli decompositions, thereby providing flexibility for a wide range of use cases.

Cristiano Merio (Grenoble-INP / TIMA & STMicrolectronics, France)
Xavier Lesage (Grenoble-INP / TIMA & Orioma, France)
Ali Naimi (Grenoble-INP / TIMA, France)
Sylvain Engels (Grenoble-INP / TIMA & STMicroelectronics, France)
Katell Morin-Allory (Grenoble-INP / TIMA, France)
Laurent Fesquet (Grenoble-INP / TIMA, France)

Method for Data-Driven Pruning in Micropipeline Circuits

ABSTRACT. Asynchronous micropipeline circuits are an effective alternative to their synchronous counterparts for reducing dynamic power consumption, because they offer proper signaling for disabling useless blocks. In this paper, a novel approach is suggested that uses such signaling to prune irrelevant data. The result is a decrease in switching activity, dynamic consumption, and an increase of the average throughput. The method relies on new control-path elements, which conditionally prune data-path elements. Thanks to these controllers, the registers do not sample new data when pruned and subsequent data propagation is avoided. The former can replace standard controllers in any micropipeline circuit without changing the architecture of the control-path. Furthermore, the outline of the methodology is given for a micropipeline circuit and the method is explained through the illustrative example of a digital filter.

Suman Deb (Nanyang Technological University, Singapore)
Anupam Chattopadhyay (Nanyang Technological University, Singapore)
Avi Mendelson (Technion, Israel)

A RISC-V SoC with Hardware Trojans: Case Study on Trojan-ing the On-Chip Protocol Conversion

ABSTRACT. Hardware Trojans (HTs) are a serious security threat to the highly-decentralized, multi-stage production flow of today’s Integrated Circuit (IC) industry. Considerable research efforts have gone into developing methodologies for detecting HTs. A significant issue in validating HT detection algorithms is the lack of open-source benchmarks with the complexity of modern-day System-on-Chips (SoCs). The currently available open-source benchmarks are more elementary and, therefore, do not reveal the actual robustness of the algorithms against false positives and false negatives. To address this issue, we present the design and integration of three kinds of HTs in a RISC-V—based SoC. We explain their functionality and taxonomy in detail. This is the first work that launches trojan attacks targeting the mismatch in the attributes of two widely-used on-chip communication protocols in an SoC. We performed extensive behavioral simulations to verify the functionality of each of these HTs. We estimated the detectability of these HTs by: (i) synthesizing the HT-infested SoC for FPGA and (ii) evaluating them against a Graph Neural Network-based pre-Silicon HT detection tool, automatic test pattern generation, reverse engineering, and formal verification. In a nutshell, this paper demonstrates the risk of HTs and an effective environment for strengthening research on HT detection.

Shan-Hui Chou (Institute of Electronics, National Yang Ming Chiao Tung University, Taiwan, Taiwan)
Ting-Yun Hsiao (Institute of Electronics, National Yang Ming Chiao Tung University, Taiwan, Taiwan)
Jing-Yang Jou (Institute of Electronics, National Yang Ming Chiao Tung University, Taiwan, Taiwan)
Juinn-Dar Huang (Institute of Electronics, National Yang Ming Chiao Tung University, Taiwan, Taiwan)

An Evaluation and Architecture Exploration Engine for CNN Accelerators through Extensive Dataflow Analysis

ABSTRACT. Systolic array is one of the popular convolutional neural network accelerator architectures due to its high computation efficiency. Nevertheless, the huge design space and complicated interactions among different design parameters make it hard to find the best configuration for various applications. To overcome this issue, this paper presents an evaluation and design space exploration engine, NNeed, for systolic-array CNN accelerators through extensive dataflow analysis. It uses a highly configurable hardware template to describe accelerator operations in detail. The rapid evaluation provides PPA results, pipeline stage analysis, external memory access statistics, and so on. NNeed explores the 9-dimensional design space and supports multiple objective functions for design optimization. Experimental results show that NNeed can generate an accelerator configuration with up to 23% and 50% improvement in performance and energy as compared with a typical handcrafted design.

Kais Belwafi (C2PS Center, Khalifa University, Abu Dhabi, UAE, UAE)
Hamdan Alshamsi (C2PS Center, Khalifa University, Abu Dhabi, UAE, UAE)
Ashfaq Ahmed (C2PS Center, Khalifa University, Abu Dhabi, UAE, UAE)
Abdulhadi Shoufan (C2PS Center, Khalifa University, Abu Dhabi, UAE, UAE)

Zero-Trust Communication between Chips

PRESENTER: Kais Belwafi

ABSTRACT. Outsourcing chip production is a common practice among semiconductor vendors to cope with the increasing demand for integrated circuits. This has resulted in several security issues in the chip supply chain, including hardware trojans, intellectual property theft, and overproduction. The concept of zero trust --never trust, always verify-- presents a promising solution for ensuring the authenticity of Integrated Circuits (ICs), particularly in critical systems where adversary attacks can cause significant losses or damage. The Security Protocol and Data Model (SPDM) is a reliable protocol that uses certificates to ensure the authenticity of ICs. Based on this protocol, the presented paper proposes a chip-to-chip zero-trust security architecture that aims to verify the authenticity of any connected peripheral before its use. The contributions include an overview of the proposed architecture, implementation and formal verification of the SPDM protocol, and analysis of the challenges encountered during the implementation and execution.

Ye Ziyang (Graduate School of Engineering, The University of Tokyo, Japan)
Makoto Ikeda (Graduate School of Engineering, The University of Tokyo, Japan)

Dynamic Digital Circuit Locking (DDCL): A Shield against Static Analysis Attacks

PRESENTER: Ye Ziyang

ABSTRACT. With the rise of the fabless business model, security threats, including Intellectual Property (IP) theft, overproduction, counterfeiting, and reverse engineering, have increased. This paper introduces Dynamic Digital Circuit Locking (DDCL) as a method to counteract these threats. At its core, DDCL utilizes dynamic logic gates for locking. These gates mimic the operation of standard logic gates through a dynamic process, thereby exploiting adversaries who depend on static digital circuit analysis. As a result, DDCL can resist all static analysis attacks more effectively than conventional techniques. DDCL surpasses earlier methods by its reliance on logic loops for proper operation, which makes loop breaking attacks less effective. However, the advanced security offered by DDCL also presents challenges, such as increased power consumption and circuit complexity. This paper further examines the structure, security aspects, and comparative performance of DDCL. It underscores its value in multi-vendor scenarios and its compatibility with existing IP cores, which require only minor changes to original designs, thereby illustrating the practical role of DDCL in enhancing hardware security.

Chun-Jen Tsai (National Yang Ming Chiao Tung University, Taiwan)
Chun Wei Chao (National Yang Ming Chiao Tung University, Taiwan)
Sheng-Di Hong (National Yang Ming Chiao Tung University, Taiwan)

Integrated Dynamic Memory Manager for a RISC-V Processor

ABSTRACT. In this paper, we present an open-source RISC-V processor with an integrated dynamic memory manager hardware module. Traditionally, the management of the main memory of a computing system is handled by a software library. However, the process involves searching and manipulation of the link lists of memory blocks, which can be expensive when the memory becomes fragmented. As a result, for embedded systems that have to be online for a long duration, a static data structure is often used to reduce the overhead of dynamic memory management at the cost of less software flexibility. Nevertheless, modern VLSI technology allows the efficient implementations of hardwired resource managers directly into the processor microarchitecture for better performance. As the experiments in this paper show, a hardware memory manager integrated within the processor core can be much more efficient than using a software library. Hardwired resource managers are particularly useful for IOT devices since the processors typically run at a lower clock rate. The proposed architecture is implemented and verified on a Xilinx FPGA development board and will be made open source.

15:30-17:00 Session 13: Analog, Mixed Signal, and RF Designs

Chairs:

Salvador Mir (TIMA Laboratory, France)
Jaime Viegas (Khalifa University, UAE)

Location: Main Building Hall A

15:30	Solomon Micheal Serunjogi (New York University, UAE) Mihai Sanduleanu (Khalifa University, UAE) 3.125GS/s, 4.9 ENOB, 109 fJ/Conversion Time-Domain ADC for Backplane Interconnect ABSTRACT. This paper presents a flash, Time Domain ADC with T/H amplifier, Voltage Controlled Delay Line and Time to Digital Converter. The design is operating at 3.125GS/s with 4.9 ENOB and a Walden figure of merit of 109fJ/Conversion. Automatic calibration means are provided as well. For measurements purposes, an integrated memory is provided. It consumes 16.2mW from a 1V supply. It was realized in the 45nm PDSOI from Global Foundries.
15:48	Shahid Jamil (NUCES (FAST-NU), Pakistan) Muhammad Usman (NUCES (FAST-NU), Pakistan) Muhammad Jawad Shakil (NUCES (FAST-NU), Pakistan) Jafar Hussain (NUCES (FAST-NU), Pakistan) Rashad Ramzan (NUCES (FAST-NU), Pakistan) Bi-Directional Time Domain Duplexing (TDD) Amplifier for 5G Applications ABSTRACT. In Time Division Duplex Systems (TDD) the transmitter and receiver are not ON simultaneously. This is done to avoid the saturation of the receiver’s LNA from the high power that can get leaked into the receiver due to limited isolation. Since, either the PA or the LNA is working at a given instant of time, the area and power can be saved by designing a circuit that can work as LNA or PA at different instances in time. Low power and reduced Si area leads to reliability and cost reduction. This work presents the design of a bi-directional amplifier (BDA) in sub 5GHz range. The design contains a two-stage inductor-less LNA along with transistor-based switches and 50Ω drivers for off-chip testing. The design is implemented in Skywater 130nm CMOS open-source PDK. The results show 29.44dB voltage gain in reception mode (without buffer), less than 3.5dB noise figure in 1-4.5GHz band with input matching of -10dB. The 1-dB compression point in the transmission mode is 19dBm with a power consumption of 58mW. The active chip area of the design is 204x166μm2
16:06	Yiyang Yu (King Abdullah University of Science and Technology, Saudi Arabia) Atif Shamim (King Abdullah University of Science and Technology, Saudi Arabia) Gain Enhancement of Antenna-on-Chip at 94 GHz with an Integrated Artificial Magnetic Conductor for 6G System-on-Chip ABSTRACT. Silicon-based complementary metal oxide semiconductor (CMOS) process has become one of the most popular processes to realize system-on-chip (SoC). However, as one of the essential components of wireless SoC, antennas are typically suffering from the poor radiation because of the highly conductive silicon substrate. Such antennas are known as antenna-on-chip (AoC). To enhance the radiation performance of AoC, an artificial magnetic conductor (AMC) with double periodic strip structure layers has been proposed in this paper that can not only provide in-phase reflection but also isolate the antenna from the lossy silicon substrate. The proposed AMC shows a gain enhancement of 5 dB. The AMC-backed AoC is well-matched within 76-123 GHz and provides a boresight gain of 2.5 dBi at 94 GHz.
16:24	Junjie Li (School of Information Science and Engineering, Southeast University, China) Youming Zhang (School of Cyber Science and Engineering, Southeast University, China) Yunqi Cao (School of Cyber Science and Engineering, Southeast University, China) Xusheng Tang (School of Cyber Science and Engineering, Southeast University, China) Fengyi Huang (School of Cyber Science and Engineering, Southeast University, China) A Unity Feedback Length-Extend Delta-Sigma Modulator for Fractional-N Frequency Synthesizer ABSTRACT. A unity feedback length-extend multistage noise-shaping (MASH) delta-sigma modulator (DSM) is presented in this paper. The proposed length extension technique adds a feedback path, like HK-MASH, and fixes the feedback factor to unity. In this way, only one additional register is needed, which decreases the operating time and the hardware cost of DSM, and achieves maximum sequence length of M-1 (M=2^(n0 ), n0 is input word length). Using this structure, MASH DSMs are optimized with maximum sequence length extending to (M-1)^l ( l is the order of the MASH DSM) and the minimum extending to N(M-1)^(l-1) ( N is the smallest prime number of M-1). This paper proves that the output sequence length is exponentially increased, regardless of the input value. Compared with classical structure, the proposed MASH 1-1-1 structure shows a spur-free performance at the expense of limited hardware cost.
16:42	Muhammad Jawad Shakil (National University of Computer and Emerging Sciences (FAST-NU), Islamabad, Pakistan) Uzair Ahmad (National of university of computer and emerging sciences (FAST-NU), Islamabad, Pakistan) Jafar Hussain (National University of Computer and Emerging Sciences (FAST-NU), Islamabad, Pakistan) Hassan Saif (National University of Computer and Emerging Sciences (FAST-NU), Islamabad, Pakistan) Rashad Ramzan (National University of Computer and Emerging Sciences (FAST-NU), Islamabad, Pakistan) A Bondwire Inductor Based Flash ADC Assisted DC-DC Buck Converter PRESENTER: Muhammad Jawad Shakil ABSTRACT. A Flash ADC-assisted fast transient response DC-DC buck converter is proposed in this paper. The proposed inductor-based DC-DC converter operates in continuous conduction mode (CCM). The novelty in architecture includes replacing SAR ADC with Flash ADC which converts the transient overshoot/undershoot into corresponding binary code at a faster rate as compared to SAR ADC. Depending on the overshoot and undershoot set criteria, the current pump injects the current into the output node to charge /discharge the output node to minimize the overshoot/undershoot and settling time. In the proposed design, required inductance is achieved by using bond-wire inductance, which reduces the chip's active area, and off-chip components. ADS simulations are performed for the estimation of inductor values using the QFN-64 package. The proposed DC-DC converter occupies an active area of 0.91mm² using TSMC 130 nm bulk CMOS process. Post layout simulation results show the peak efficiency of 84% at V_in of 3.3 V, V_out of 1.8 V with the load current I_L of 500 mA. The measured overshoot/undershoot in simulation is 70/60 mV with 530/469 ns recovery time and 7 mV output ripple. This enables the designed converter to be used in applications like aerospace, satellite, as well as military applications, where compactness is required.

18:00-21:00 Social Event: Trip to Souq Madinat Jumeirah, Dubai