SASIMI 2018: THE 21ST WORKSHOP ON SYNTHESIS AND SYSTEM INTEGRATION OF MIXED INFORMATION TECHNOLOGIES
PROGRAM FOR TUESDAY, MARCH 27TH
Days:
previous day
all days

View: session overviewtalk overview

09:10-10:00 Session I2: Invited Talk II
Chair:
Hiroshi Saito (University of Aizu, Japan)
09:10
Elena Dubrova (Royal Institute of Technology, Sweden)
Lightweight Cryptographic Primitives for Resource-Constrained Devices
SPEAKER: Elena Dubrova

ABSTRACT. Today minimal or no security is typically provided to low-end wireless devices in the conventional belief that the information they gather is of little concern to attackers. However, studies have shown that a compromised sensor can be used as a stepping stone to mount an attack on the network. As the number and type of connected devices grows, so are the security risks. In this talk we address several aspects of this important, many-folded problem. First, we present cryptographic primitives which can assure confidentiality and integrity of data while satisfying resource constrains of low-end devices. Second, we describe lightweight countermeasures against hardware Trojans. Finally, we show how Physical Unclonable Functions (PUFs) can be used for assuring tamper-resistance at low cost.

10:00-11:50 Session R3: Regular Poster Session III
Chairs:
Hiromitsu Awano (The University of Tokyo, Japan)
Masanori Muroyama (Tohoku University, Japan)
10:00
Salita Sombatsiri (NEC Corporation, Japan)
Seiya Shibata (NEC Corporation, Japan)
Yuki Kobayashi (NEC Corporation, Japan)
Hiroaki Inoue (NEC Corporation, Japan)
Takashi Takenaka (NEC Corporation, Japan)
Takeo Hosomi (NEC Corporation, Japan)
Parallelism-Flexible Convolution Core for Sparse Convolutional Neural Networks

ABSTRACT. This paper proposes a convolution core for sparse CNN that is capable of flexibly alternating the parallelism schemes and degree exploiting intra- and inter-output parallelism of CNN's convolutional layer, and leveraging weight sparsity effectively using a compressed sparse weight in CSC format and output-stationary dataflow. The experimental results show that the performance is improved by 3.9 times even in the deeper layer where the conventional accelerator could not fully exploit the parallelism due to the small layer size. The proposed architecture could also exploit the weight sparsity, then by combining both the multi-parallelism and the weight sparsity the proposed architecture achieved 5.2 times better performance than the conventional accelerator.

10:02
Zuitoku Shin (Kyoto University, Japan)
Shumpei Morita (Kyoto University, Japan)
Song Bian (Kyoto University, Japan)
Michihiro Shintani (Nara Institute of Science and Technology, Japan)
Masayuki Hiromoto (Kyoto University, Japan)
Takashi Sato (Kyoto University, Japan)
Comparative Study of Delay Degradation Caused by NBTI Considering Stress Frequency Dependence
SPEAKER: Zuitoku Shin

ABSTRACT. The degradation of transistors in integrated circuits is known to be dependent on stress frequency in addition to the well-known stress duty cycle. This paper analyzes the impact of frequency dependence of the NBTI degradation on a processor-scale circuit under various workload scenarios by using different levels of available information. A simple estimation for wire switching frequency from duty cycle is also proposed. Using real workloads running on an MIPS processor, it is found that the frequency dependency of the worst path delay is not large since there are many DC-stress components independent of the frequency. However, frequency dependency of the path delay increases when DC component decreases due to execution of multiple applications.

10:04
Shuhei Ishino (Tokyo University of Agriculture and Technology, Japan)
Mitsuru Hasegawa (Tokyo University of Agriculture and Technology, Japan)
Kunihiro Fujiyoshi (Tokyo University of Agriculture and Technology, Japan)
A Method of Layout Pattern Classification Using Clustering
SPEAKER: Shuhei Ishino

ABSTRACT. In recent years, pattern matching based hotspot detection methods are proposed. In the method, hotspots are detected based on a hotspot library created by classifying layout clips (called ``Layout Pattern Classification''). In the classification, layout clips centered on hotspots are classified into groups to create a hotspot library required to be as compact as possible. In this paper, we propose a method of the classification, which use clustering before ILP to divide problem into small subproblems so that classification can be possible in practical time by ILP. Experimental results indicate the effectiveness of the proposed method.

10:06
Yu-Guang Chen (Yuan Ze University, Taiwan)
Kun-Wei Chiu (National Cheng Kung University, Taiwan)
Ing-Chao Lin (National Cheng Kung University, Taiwan)
A Novel NBTI-Aware Wake-up Strategy for Power-Gated Designs
SPEAKER: Yu-Guang Chen

ABSTRACT. Power gating has become one of the most effective ways to reduce leakage power. However, turning on sleep transistors simultaneously may induce excessive surge current and threaten signal integrity. Therefore, sleep transistor wake-up sequence should be carefully designed to avoid significant surge current. On the other hand, sleep transistors may suffer from Negative-Bias Temperature Instability (NBTI) and the wake-up time is increased after aging. Conventional wake-up strategies does not take the NBTI effect into consideration and may result in longer or unacceptable wake-up time after circuit aging. This paper proposes a novel NBTI-aware wake-up strategy to address this issue. Our strategy first finds a set of proper wake-up sequences under different aging circumstance, and then dynamically reconfigures wake-up sequences at runtime based on on-line aging scenario. Experimental results show that compared with a traditional fixed wake-up sequence approach, our strategy can reduce up to 45% wake-up time latency with only 3.7% extra area overhead.

10:08
Chunfeng Liu (Technical University of Munich, Germany)
Tsung-Yi Ho (National Tsing Hua University, Taiwan)
Test Vector Generation for Microfluidic Fully Programmable Valve Arrays (FPVAs)
SPEAKER: Tsung-Yi Ho

ABSTRACT. Fully Programmable Valve Array (FPVA) has emerged as a new architecture for the next-generation flow-based microfluidic biochips. This 2D-array consists of regularly-arranged valves, which can be dynamically configured by users to realize microfluidic devices of different shapes and sizes as well as interconnections. Additionally, the regularity of the underlying structure renders FPVAs easier to integrate on a tiny chip. However, these arrays may suffer from various manufacturing defects such as blockage and leakage in control and flow channels. Unfortunately, no efficient method is yet known for testing such a general-purpose architecture. In this paper, we present a novel formulation using the concept of flow paths and cut-sets, and describe an ILP-based hierarchical strategy for generating compact test sets that can detect multiple faults in FPVAs. Simulation results demonstrate the efficacy of the proposed method in detecting manufacturing faults with only a small number of test vectors.

10:10
Yi-Jung Chen (National Chi Nan University, Taiwan)
Wen-Wei Chang (National Chi Nan University, Taiwan)
Chia-Yin Liu (National Chi Nan University, Taiwan)
Bo-Yuan Chen (National Chi Nan University, Taiwan)
Ming-Ying Tsai (National Chi Nan University, Taiwan)
Processor and Memory Co-Allocation for MPSoCs with Single-ISA Heterogeneous Multi-Core Architecture
SPEAKER: Yi-Jung Chen

ABSTRACT. Single-ISA (Instruction Set Architecture) heterogeneous multi-processor architecture has the advantage of easy for development as the homogeneous architecture, and the advantage of customizing resource as the heterogeneous architecture. Targeting MPSoCs with this architecture, we propose a processor and memory resource allocation method to optimize the performance of the given workload while the area constraint is met. To bring out the best performance of a selected hardware configuration, the proposed method also adjust the software design of task and data mapping accordingly. The experimental results show that, compared to the synthesis method that considers processor only, the proposed method achieves up to 70\% performance improvement when using the real-world workloads.

10:12
Louis Y.-Z. Lin (National Chiao Tung University, Taiwan)
Charles H.-P. Wen (National Chiao Tung University, Taiwan)
Accelerating Deterministic Parallel Test Pattern Generation by Hiding Latency Among Multi-threads

ABSTRACT. With sufficient computing power, parallelism is a reliable solution to accelerate test pattern generation (TPG). Existing parallel TPG can reach linear speedup and reduce the test pattern count. However, these works are mostly non-deterministic and the test set depends on computational resources. Although a synchronized dynamic compaction (SDC) parallel TPG is proposed to give deterministic results, it suffers from long waiting time. This paper presents a hiding latency(HL) TPG to accelerate the SDC process. Experimental results show that the HL-TPG generates the same test pattern set as the conventional serial TPG with dynamic compaction. Moreover, the proposed method reduces 8% time cost by using 8 threads compared to the SDC parallel TPG.

10:14
Li-Chin Chen (NCTU Taiwan, Taiwan)
Hung-Ming Chen (NCTU Taiwan, Taiwan)
Learning to Predict DRC Violations During Placement

ABSTRACT. Since the routability of a circuit is a critical challenge due to the complexity of design rules, is it possible to predict the design rule violation during placement stage? There is a growing gap between global routing and the actual violations in detailed routing. This miscorrelation may end up unroutable after the detailed routing. In this study, a methodology based on machine learning technique is proposed to effectively predict detailed routing violations. After extracting appropriate features from placement, fast global routing and detailed routing violations, we use support vector machine (SVM) techniques to train the prediction model, different from regression framework. Some preliminary experiments show that the proposed approach can effectively forecast design rules violation during placement stage.

10:16
Kazuho Katsumata (Japan Advanced Institute of Science and Technology, Japan)
Junghoon Oh (Japan Advanced Institute of Science and Technology, Japan)
Mineo Kaneko (Japan Advanced Institute of Science and Technology, Japan)
Register Binding in Datapath Synthesis Considering Post-Silicon Skew Tunability
SPEAKER: Junghoon Oh

ABSTRACT. Post-Silicon Skew Tuning is one of the promising techniques to overcome variability-related problems. This paper proposes a register binding method in high level synthesis of datapaths considering post-silicon skew tunability. The objective of this optimization is to minimize the minimum clock period achieved by post-silicon chip-by-chip skew tuning. In the proposed method, resource sharing is incrementally repeated with being guided by the performance evaluation given by Monte-Carlo simulation. The experimental result shows the trade-off between the degree of resource sharing and stochastic speed performance, and also the existence of catastrophic degradation of the speed performance during the progress of resource sharing.

10:18
Tadaaki Tanimoto (Renesas Electronics Corporation, Japan)
Differential Update of Automotive Control Device Firmware

ABSTRACT. The existing tools for software updates in workstations, mobile/smart phones, and WSN (Wireless Sensor Network) nodes cannot be used with the severely resource and hard real-time constrained automotive control units. Such systems typically employ MMU (Memory Management Unit) less monolithic execution environment due to its cost and real-time effectiveness. This paper deals with differential firmware update (re-programming) for such systems, by describing 1) differential firmware update mechanism with safety fallback, 2) difference generation and its difficulties, and 3) provably secure firmware update protocol within in-vehicle network and security consideration on its real usage.

10:20
Ying-Chi Wei (National Chiao Tung University, Taiwan)
Hong-Yan Su (National Chiao Tung University, Taiwan)
Radhamanjari Samanta (National Chiao Tung University, Taiwan)
Yih-Lang Li (National Chiao Tung University, Taiwan)
LESAR: A Dynamic Line-End Spacing Aware Detailed Router
SPEAKER: Hong-Yan Su

ABSTRACT. As the VLSI technology scales down, 193nm optical lithography reaches the limit and one-dimensional (1D) unidirectional style lithography technique emerges as one of the most promising solutions for coming advanced technology nodes. The VLSI Industry advances towards the extreme regular 1D style design from the traditional 2D style techniques. The 1D process first generates unidirectional dense metal lines and then use line-end cutting to form the target patterns with cut masks. The cut patterns give rise to several challenges. If cuts are too close, they will lead to conflicts. Line-end spacing rules become dynamic rather than static because of cut mask and also now need to be followed strictly. As multiple patterning lithography techniques become inevitable for advanced technology nodes, line-end spacing check between two line-end pairs in the same mask has also been regarded as compulsory line-end spacing constraints that have not discussed in previous works yet. Complying with these rules during APR has become a new bottleneck. In this work, we propose to make the router aware of the dynamic line-end spacing rules, including end-end spacing and parity spacing constraints. Experimental results of our proposed router demonstrates that it can effectively expel all end-end spacing violations as well as 75% of parity spacing violations in a reasonable runtime increase of 14%.

10:22
Chung-Cheng Su (National Chiao Tung University, Taiwan)
Yi-Cheng Hsieh (National Chiao Tung University, Taiwan)
Jia-Heng Chang (National Chiao Tung University, Taiwan)
Chung-Chih Hung (National Chiao Tung University, Taiwan)
A 12-bit 10MS/s SAR ADC with Mixed Switching and Background Offset Calibration

ABSTRACT. This paper presents a 12-bit 10MS/s successive approximation register (SAR) analog-to-digital converter (ADC) in TSMC 0.18-µm process. To reduce the switching energy and save the total capacitance, a mixed switching procedure with the split capacitor is applied. The mixed switching procedure combines the merged capacitor switching with monotonic switching. Meanwhile, a dynamic comparator with charge pump to achieve low offset is used. Using 1.8V supply voltage and the sampling rate of 10MS/s, the SAR ADC achieves an signal-to-noise-and-distortion-ratios (SNDR) of 66.73dB and an effective number of bits (ENOB) of 10.79 bit with 1.975MS/s input frequency. Its power consumption is 736.23µW and figure-of-merit (FOM) is 41.58 fJ/conversion-step.

10:24
Yu-Yi Wu (Chung Yuan Christian University, Taiwan)
Shih-Hsu Huang (Chung Yuan Christian University, Taiwan)
Test Wrapper Chain Design for Three-Dimensional SoCs under TSV Count Constraint

ABSTRACT. A system-on-Chip (SoC) design consists of many embedded cores. In order to test these embedded cores, modular wrapper design needs to connect scan elements to form test wrapper chains. Since the longest test wrapper chain affects the test time, how to balance these wrapper chains is an important topic. As the feature size continues to shrink, the wire length has become a serious concern. Three-dimensional integrated circuits (3D ICs) provides a promising solution, but it also brings a challenge in modular wrapper design. Since scan elements in a 3D IC spans multiple layers, we not only need to balance these wrapper chains but also need to reduce the number of through-silicon-vias (TSV) usages. However, the previous work performs the scan elements assignment without considering the TSV usage. As a result, the previous work often leads to a large TSV count. Based on that observation, in this paper, we propose a two-stage algorithm for test wrapper chain optimization. Different from the previous work, our objective is to minimize the test time under the given TSV number constraint. Experiments with ITC’02 benchmark circuits consistently show that our approach can achieve the near-minimum test time under the given TSV number constraints.

10:26
Tetsuaki Fujimoto (Ritsumeikan University, Japan)
Wataru Takahashi (NEC Corporation, Japan)
Kazutoshi Wakabayashi (NEC Corporation, Japan)
Takashi Imagawa (Ritsumeikan University, Japan)
Hiroyuki Ochi (Ritsumeikan University, Japan)
Novel Implementation of FFT for Mixed Grained Reconfigurable Architecture Using Via-switch

ABSTRACT. This paper proposes an optimal implementation of FFT circuit for mixed grained reconfigurable architecture using via-switch. In the target architecture, the programmable routing resources are implemented in the metal layers thanks to via-switch, and as a result, rich amount of functional resources can be implemented on the substrate layer. To make full use of the rich arithmetic resources, the proposed method realizes a high-speed one-cycle-per-stage fully-parallelized processing by using N/2 butterfly units for N-point FFT. It also introduces fixed- stride FFT that makes the data access pattern fixed for all stages, to reduce multiplexers drastically. Compared with the Cooley-Tukey FFT, the required logic blocks for 64-point FFT are reduced by about 29%.

10:28
Chun-Chia Kuo (CMSC, Inc., Taiwan)
Yi-Yu Liu (National Taiwan University of Science and Technology, Taiwan)
Voltage-drop Aware Timing Analysis for Pessimism Design Constraint Prevention
SPEAKER: Chun-Chia Kuo

ABSTRACT. Voltage-drop effects severely impact the nominal circuit delay in advanced technology nodes. In this work, we analyzed how voltage drop and ground bounce effects affect the delay of standard cells, and propose a voltage-drop aware timing analysis technique, VATA, to estimate circuit delay. We import voltage drop information for detailed circuit delay estimation with pre-generated multiple-Vdd standard cell libraries. The experimental results indicate that our VATA technique could identify the real critical path delay taking IR-drop effect into account rather than using one single pessimism IR-drop hard constraint for the entire design.

10:30
Tsutomu Sasao (Meiji University, Japan)
Kyu Matsuura (Meiji University, Japan)
Yukihiro Iguchi (Meiji University, Japan)
A Method to Identify Affine Equivalence Classes of Logic Functions
SPEAKER: Tsutomu Sasao

ABSTRACT. Two logic functions are affine equivalent (A-equivalent) iff one can obtained from the other by an affine transformation of the input variables. This paper shows an efficient method to check the A-equivalence of two logic functions using the autocorrelation spectrum and Walsh spectrum. Experimental results of up to six variables are shown.

10:32
Soichiro Ito (Ritsumeikan University, Japan)
Yoshiki Tsuchida (Ritsumeikan University, Japan)
Masahiro Fukui (Ritsumeikan University, Japan)
Development and Evaluation of a Magnetic Resonant Coupling Wireless Power Transfer System for Multiple Receivers
SPEAKER: Soichiro Ito

ABSTRACT. This research proposes a new magnetic resonant WPT(wireless power transfer) system which transfers the power to multiple receivers. In general, the received power at each receiver differs by the distance between the transfer and receiver coils. However, this controls the received power by varying the resistance of receives. Optimum shape of the transfer coil and size of receiver coils are also considered.

10:34
Saki Yamaguchi (University of Kitakyushu, Japan)
Yasuhiro Takashima (University of Kitakyushu, Japan)
Relaxed Routing Problem with Constraint Satisfaction Problem

ABSTRACT. This paper proposes a relaxed routing problem formulated by constraint satisfaction problem. In the method, we utilize a one-hot coding for the routing representation and divide a three or more terminal net into several two terminal nets. We confirm the efficiency of the proposed method, empirically.

10:36
Masashi Imai (Hirosaki University, Japan)
Naoya Onizawa (Tohoku University, Japan)
Takahiro Hanyu (Tohoku University, Japan)
Tomohiro Yoneda (National Institute of Informatics, Japan)
Minimum Power Supply Asynchronous Circuits for Re-initialization Free Computing
SPEAKER: Masashi Imai

ABSTRACT. Re-initialization of computing may be required to complete the whole operations in energy harvesting systems since the power supply is unstable. In this paper, minimum power supply asynchronous circuits using only CMOS devices are presented. In the proposed system, two voltage lines are used. One supply voltage may be off while another one is always on. When the power instability is detected, the data processed are evacuated to the backup latches whose voltage is always on. And the power supply is recovered, the backup data is restored in the corresponding storage elements and the continuous operations are performed without re-initialization.

11:50-13:20Lunch Break
13:20-14:10 Session I3: Invited Talk III
Chair:
Yukihide Kohira (The University of Aizu, Japan)
13:20
Iris Hui-Ru Jiang (National Taiwan University, Taiwan)
Timing is Everything!

ABSTRACT. Timing analysis is a process that verifies the timing performance of a design under worst-case conditions. In the modern IC design flow, timing analysis is essential in identifying timing critical paths and avoiding wasteful over-optimization.

In this talk, we investigate several key issues that should be handled by timing analysis tools for facilitating design closure for modern IC designs: 1) Common path pessimism removal: To capture more accurate timing performance of a design, common path pessimism removal is prevalent to eliminate artificially induced pessimism in clock paths during timing analysis. 2) Incremental timing analysis: Performance-driven optimizations are repeatedly performed throughout the modern IC design flow, how to incrementally update timing information efficiently and accurately becomes a crucial task for fast turnaround time. 3) Timing path search: Along with intensive optimizations, fast timing path search guides designers to fix timing violations. 4) Timing macro modeling: As EDA paradigm shifts from flat to hierarchical frameworks, compact and accurate timing macro modeling is the key to enable efficient and accurate hierarchical timing analysis. Recent research advances on these issues and future directions will also be discussed in this talk.

14:10-16:00 Session R4: Regular Poster Session IV
Chairs:
Nobutaka Kito (Chukyo University, Japan)
Kenshu Seto (Tokyo City University, Japan)
14:10
Yuta Nagaoka (Kyoto University, Japan)
Tohru Ishihara (Kyoto University, Japan)
Hidetoshi Onodera (Kyoto University, Japan)
Energy and Delay Optimized Multiplexer-tree Structure for Scaled Voltage Operation
SPEAKER: Yuta Nagaoka

ABSTRACT. In this paper, we propose a MUX structure suitable for FPGA that operates in a wide supply voltage range. The key idea is to mix two types of MUX gates which are respectively based on Transmission-Gate (TG in short) and And-Or-Invert 22 (AOI22 in short) so that the delay of the MUX-tree is suppressed. With a simple delay model for 2:1 MUX gate structures, a suitable structure for both speed and energy efficiency is derived. In results of evaluation based on measurements in a 65-nm process technology, the proposed structure consumes up to 30.1 % less energy than that of TG-based structure.

14:12
Ryota Uematsu (Hokkaido University, Japan)
Kota Ando (Hokkaido University, Japan)
Kodai Ueyoshi (Hokkaido University, Japan)
Kazutoshi Hirose (Hokkaido University, Japan)
Masayuki Ikebe (Hokkaido University, Japan)
Tetsuya Asai (Hokkaido University, Japan)
Shinya Takamaeda-Yamazaki (Hokkaido University, Japan)
Masato Motomura (Hokkaido University, Japan)
Exploring CNN Accelerator Design Space on a Dynamically Reconfigurable Hardware Platform
SPEAKER: Ryota Uematsu

ABSTRACT. Convolutional neural networks (CNNs) have a number of layers where the computational requirements differ vastly. Commonly examined reconfigurable solutions for accelerating a CNN using field-programmable gate arrays (FPGAs) involve instantiating a single architectural template that is sub-optimal to those layers. We propose exploiting dynamically reconfigurable hardware to solve this problem. By dynamically switching the hardware configuration to a structure optimized for each layer, the efficiency, such as the memory usage, can be substantially improved.

14:14
Jin Liu (Osaka University, Japan)
Masahide Hatanaka (Osaka University, Japan)
Takao Onoye (Osaka University, Japan)
A Collision Mitigation Method on Spatial Reuse for WLAN in a Dense Residential Environment
SPEAKER: Jin Liu

ABSTRACT. Recently, due to the popularity of wireless communication services, more and more access points (APs) are being extensively deployed to service an ever-increasing number of mobile stations (STAs). In such dense scenarios, unsystematic co-channel deployment may lead to poor network performance. The latest wireless local area network (WLAN) IEEE 802.11ax standard intends to improve spatial reuse to enhance the overall throughput in dense scenarios and thus accommodate increased mobile traffic. However, aggressively enhancing spatial reuse may increase the collision probability rather than improve throughput. In this study, we propose a novel dynamic sensitivity control (DSC) algorithm to improve spatial reuse while mitigating collisions caused by multi-transmission. The achieved trade-off between spatial reuse and collisions yields a throughput gain of 11% using DSC.

14:16
Tomoya Fujii (Tokyo Institute of Technology, Japan)
Shimpei Sato (Tokyo Institute of Technology, Japan)
Hiroki Nakahara (Tokyo Institute of Technology, Japan)
A Design Algorithm for a Neuron Pruning Toward a Compact Binarized Deep Convolution Neural Network on an FPGA
SPEAKER: Tomoya Fujii

ABSTRACT. For a pre-trained deep convolutional neural network (CNN) for an embedded system, a high-speed and a low power consumption are required. In the former of the CNN, it consists of convolutional layers, while in the latter, it consists of fully connection layers. In the convolutional layer, the multiply accumulation operation is a bottleneck, while the fully connection layer, the memory access is a bottleneck. The binarized CNN has been proposed to realize many multiply accumulation circuit on the FPGA, thus, the convolutional layer can be done with highseed operation. However, even if we apply the binarization to the fully connection layer, the amount of memory was still bottleneck. In this paper, we propose a neuron pruning technique which eliminates almost part of the weight memory, and apply it to the binarized fully connection layer on the CNN. In that case, since the weight memory is realized by an on-chip memory on the FPGA, it achieves a high speed memory access. To further reduce the memory size, we apply the retraining the CNN after neuron pruning. In this paper, we propose a design algorithm for the neuron pruning with re-training. In the experiment, we measured the number of neurons for the original CNN, as for the 99Compared with the ARM Cortex-A57, it was 1773.0 times faster, it dissipated 3.2 times lower power, and its performance per power efficiency was 5781.3 times better. Also, compared with the Maxwell GPU, it was 11.1 times faster, it dissipated 7.7 times lower power, and its performance per power efficiency was 84.1 times better. Thus, the binarized CNN on the FPGA is suitable for the embedded system.

14:18
Daijiro Murooka (The University of Kitakyushu, Japan)
Xuechen Zang (The University of Kitakyushu, Japan)
Taisei Kubo (The University of Kitakyushu, Japan)
Yasuhiro Takashima (The University of Kitakyushu, Japan)
Shigetoshi Nakatake (The University of Kitakyushu, Japan)
Post-silicon Skew Tuning by Programmable Delay Element with Variability Analysis
SPEAKER: Xuechen Zang

ABSTRACT. The post-silicon tuning introducing programmable delay elements (PDEs) to mitigate the manufacturing variability on the delay is promising. This work presents a novel PDE based on the channel-length decomposition, and reveals that it contributes to the low-power and low-variability comparing with a conventional inverter-chain-type. In addition, in a model of a clock tree along with the PDEs, we propose a mechanism for post-silicon tuning of a skew between a pair of flip-flops by a multilevel DLL employing our PDEs of multiple delay steps. In experiments, our proposed mechanism provides a high tunability even under the variability of PDE itself. Furthermore, we demonstrate our mechanism can be extended to the clock skew distribution to reduce the peak current.

14:20
Takumi Egawa (Kyoto University, Japan)
Tohru Ishihara (Kyoto University, Japan)
Hidetoshi Onodera (Kyoto University, Japan)
Akihiko Shinya (NTT, Japan)
Shota Kita (NTT, Japan)
Kengo Nozaki (NTT, Japan)
Kenta Takata (NTT, Japan)
Masaya Notomi (NTT, Japan)
A Method of Minimizing Latency in Large Fan-In Optical Logic Circuits with Integrated Nanophotonic Technologies
SPEAKER: Takumi Egawa

ABSTRACT. Optical circuits constructed using nanophotonic logic gates have attracted significant attention due to its ultra low-latency operation. This paper first introduces optical implementations of primitive logic gates such as AND, OR, and XOR. Then, the paper proposes a method of minimizing latency in large fan-in optical logic circuits composed of the primitive optical logic gates. The method improves the extinction ratio and degradation of signal power in the final output as well as the propagation delay of the circuit. The method reduces order of minimized delay from linear to logarithm of inputs number.

14:22
Ping Lei (Waseda University, Japan)
Shinji Kimura (Waseda University, Japan)
RF-SM: Random Forest Training Process Acceleration with Subsampling Method on FPGA
SPEAKER: Ping Lei

ABSTRACT. Big data and machine learning algorithms have raised a great interest in hardware acceleration field in recent years. The high performance of FGPA calculations can overcome power constraints and can be used in data center acceleration. Random Forest (RF) is a well-known machine learning algorithm used in classification and prediction. Though RF has been implemented on FPGAs and GPUs by some studies to accelerate the training process, because of huge amount of data, the processing speed remains to be a bottleneck. To solve this problem, we proposed the subsampling method collaborated with FPGA for the acceleration of Random Forest training process. This would offload the computation intensive part to hardware to optimize the training process. This work implements the design using C in Vivado HLS for Xilinx Kintex 7 FPGA. We select a subsampling ratio of 10% for the training data set. The average acceleration rate with the subsampling method can reach 10.24× than the original hardware implementation with 100,000 instances. The classification accuracy of the evaluation result keeps at around 90%. The result shows an ideal acceleration speed while maintaining a satisfactory accuracy.

14:24
Yusuke Nozaki (Meijo University, Japan)
Masaya Yoshikawa (Meijo University, Japan)
Power Analysis Method for a Lightweight Cipher Midori
SPEAKER: Yusuke Nozaki

ABSTRACT. Lightweight ciphers, which can be used in low power, small area, and low latency, have attracted attention. Midori is a lightweight cipher for ultra-low power. On the other hand, the risk of power analysis for a cryptographic circuit is reported. However, power analysis for Midori has not been studied. This study proposes a new power analysis method for Midori to verify the tamper resistance. Experiments by a field programmable gate array show the validity of the proposed power analysis method.

14:26
Kano Akagi (Tokyo Institute of Technology, Japan)
Shimpei Sato (Tokyo Institute of Technology, Japan)
Atsushi Takahashi (Tokyo Institute of Technology, Japan)
Target Pin-Pair Selection Algorithm Using Minimum Maximum-Edge-Weight Matching for Set-Pair Routing
SPEAKER: Kano Akagi

ABSTRACT. Routing problems derived from such as silicon-interposer and etc. are often formulated as a set-pair routing problem where the combination of pin-pairs to be connected is flexible. In this paper, we propose an algorithm that gives a target pin-pair set which is used to generate a length matched routing pattern efficiently in set-pair routing. In our proposed algorithm, a target pin-pair set is formulated as a perfect matching in a completed bipartite graph. In order to obtain a better target pin-pair set efficiently, we propose an algorithm that obtains a perfect matching whose maximum-edge-weight is minimum. By using our algorithm, a target pin-pair set where distant pin-pairs are excluded is obtained and lower bounds of the maximum of connection lengths and the total connection length in set-pair routing are prevented to become large. The effectiveness of our algorithm is discussed by using several small illustrative instances.

14:28
Loo Shean Liu (National Central University, Taiwan)
Hsin-Ju Hsu (National Central University, Taiwan)
Hao-Yu Chang (National Central University, Taiwan)
Chien-Nan Jimmy Liu (National Central University, Taiwan)
Jing-Yang Jou (National Central University, Taiwan)
Hardware Implementation of WDF-Based Analog Circuit Emulation
SPEAKER: Hsin-Ju Hsu

ABSTRACT. System verification is still a big challenge for SOC designs with AMS circuits. Wave digital filter (WDF) based method was proven as a possible approach to emulate AMS circuits in digital environment [4]. Although several WDF modeling approaches have been studied to transform a given circuit into its corresponding WDF structure, how to implement the WDF structure in FPGA is often not discussed. Therefore, this paper focuses on the hardware implementation issues for WDF-based analog emulators. Besides the fixed-point implementation of traditional WDF components and adaptors, the approach to implement the proposed LUT-based model and J-type adaptor is also discussed. The FPGA results on several cases have demonstrated the feasibility of the proposed WDF-based analog emulation platform.

14:30
Toshitaka Ito (Hiroshima City University, Japan)
Yuri Itotani (Hiroshima City University, Japan)
Shin'Ichi Wakabayashi (Hiroshima City University, Japan)
Shinobu Nagayama (Hiroshima City University, Japan)
Masato Inagi (Hiroshima City University, Japan)
An FPGA-based Nearest Neighbor Search Engine Using Distance-based Hashing for High-Dimensional Data
SPEAKER: Toshitaka Ito

ABSTRACT. This paper proposes an FPGA-based nearest neighbor search engine for high-dimensional data, in which nearest neighbor search is performed based on flexible distance-based hashing (FDH). For a given query, FDH returns a small-sized candidate set of nearest neighbors, and the one closest to the query is selected as the final result. The proposed hardware search engine performs nearest neighbor search in a pipelined manner so that search results can be obtained in a short execution time. Preliminary experiments show the effectiveness and efficiency of the proposed engine.

14:32
Sayaka Terashima (Keio University, Japan)
Takuya Kojima (Keio University, Japan)
Hayate Okuhara (Keio University, Japan)
Yusuke Matsushita (Keio University, Japan)
Naoki Ando (Keio University, Japan)
Mitaro Namiki (Graduate School of Technology, Tokyo University of Agriculture and Technology, Japan)
Hideharu Amano (Keio University, Japan)
A Shared Memory Chip for Twin-Tower of Chips

ABSTRACT. A shared memory chip for the building-block computing system using Thru-Chip Interface (TCI) is developed and evaluated. The implemented memory chip can connect two blocks of 3D stacked chip blocks via TCI. Therefore, using it as a bridge between these two blocks, a new 3D Integration System in a Package (SiP) which has twin-tower of chips can be realized. In this paper, we reveal an architecture of the twin-tower using a shared memory and evaluate its performance improvement. In our evaluation, the twin-tower system can improve 35% of system performance when compared to the conventional building-block computing system.

14:34
Tetsuo Miyauchi (Japan Advanced Institute of Science and Technology, Japan)
Kiyofumi Tanaka (Japan Advanced Institute of Science and Technology, Japan)
Building a Framework for an Application-Adaptive Processor System on FPGA-based SoC

ABSTRACT. We have been studying a framework for highly application-adaptive systems. As IoT (Internet of Things) is prevailing in common, resources which an application program can use are often restricted. Use of real-time operating systems (RTOS) is effective in efficiently developing embedded applications. However, it leads to increased overhead of memory usage (footprint) and execution time. Therefore, it is desirable that the RTOS used is adapted to a target system to reduce the overhead. In this paper, we propose a method of developing some RTOS functions with hardware mechanisms, which is a part of our framework of building highly application-adaptive systems.

14:36
Zih-Ming Yeh (Chung Yuan Christian University, Taiwan)
Wei-Kai Cheng (Chung Yuan Christian University, Taiwan)
Hybrid Cross Mesh Synthesis with Register Clustering
SPEAKER: Wei-Kai Cheng

ABSTRACT. In the clock network design, trade-off between power consumption and timing closure has been a main issue for a long time. To achieve this target, hybrid clock network architecture that combines both clock tree and clock mesh seems to be a promising solution. In this paper, we propose a novel clock mesh architecture - cross mesh, with the average dispersion of the overall driving force, our methodology creates small non-zero skew clock trees. In addition, we integrating clock gating technique to further reduce dynamic power consumption. Experimental results show that our approach can get feasible solution and effectively improve both power consumption and clock skew.

14:38
Kiyoharu Hamaguchi (Shimane University, Japan)
Yosuke Kakiuchi (HIroshima Institute of Technology, Japan)
Ryosuke Takakura (Shimane University, Japan)
Applying Bayesian Network-Based Machine Learning to Regression Design Verification

ABSTRACT. We show how coverage driven verification of RTL designs can be improved with Bayesian Network-based machine learning, in the context of regression verification. The experiments show that toggle coverage can be improved in terms of the numbers of simulation runs, in particular, when we focus on value changes over 2-clock cycles.

14:40
Yoshiki Tsuchida (Ritsumeikan University, Japan)
Haruya Fujii (Ritsumeikan University, Japan)
Tomoki Abe (Ritsumeikan University, Japan)
Lei Lin (Ritsumeikan University, Japan)
Masahiro Fukui (Ritsumeikan University, Japan)
Motor Modeling, and Simulation of the Driving Performance of an EV-cart with the Motor
SPEAKER: Haruya Fujii

ABSTRACT. In recent years, electric vehicles are rapidly spreading due to energy and environmental problems. However, since the travel distance of a general electric car is shorter than that of a gasoline car, various ideas are required. In this paper we will model EV and optimize for EV minicar tracing. Acceleration method was evaluated by simulation and actual running, and the current control during acceleration was optimized.

14:42
Kosei Yamaguchi (Ritsumeikan University, Japan)
Takashi Imagawa (Ritsumeikan University, Japan)
Hiroyuki Ochi (Ritsumeikan University, Japan)
Routing Method Considering Programming Constraint of Reconfigurable Device Using Via-switch Crossbars

ABSTRACT. This paper proposes a new routing method that considers constraint on the programming of switches in the reconfigurable architecture using via-switch. Compared with the programmable switch in the conventional FPGAs that consists of SRAM and pass transistor, via-switch is smaller in area, on-resistance and parasitic capacitance, which are expected to achieve dramatic improvement in performance and energy consumption. To take the advantage of the via-switch, crossbar structure is used that consists of an array of via-switches with programming control circuitry on the edges. However, it is unable to make multiple switches in the same row in a crossbar to on-state. This necessitates an algorithm for deriving routing satisfying the constraint, but if the constraint is added to the A* algorithm, a general method for finding shortest paths, it can fail in finding the paths. The proposed method successfully finds the paths by storing the best two paths as the candidates for exploration. From our experiments, routing is successfully completed with 1.4% fewer total hops compared with conventional A* algorithm in the best case.

14:44
Jukiya Furushima (The University of Aizu, Japan)
Tatsuki Otake (The University of Aizu, Japan)
Hiroshi Saito (The University of Aizu, Japan)
Performance Optimization by Placement Constraints for FPGA-based Asynchronous Processors

ABSTRACT. In this paper, we propose a performance optimization method for Field Programmable Gate Array (FPGA)-based asynchronous processors using placement constraints. The proposed method consists of two-step placement constraint generation. First is based on floorplanning to reduce path delays among resources. Second is to reduce control-path delays inside control modules and delay elements considering the structure of FPGA. The use of generated placement constraints results in not only performance optimization but also reduction of delay adjustments by stabilizing delay variation. In the experiment, we design asynchronous MIPS processors using the proposed method. The experimental result shows that performance is optimized with fewer delay adjustments by the proposed method.

14:46
Kaoru Furumi (Hirosaki University, Japan)
Shintaro Okamoto (Hirosaki University, Japan)
Toshiki Kanamoto (Hirosaki University, Japan)
Masashi Imai (Hirosaki University, Japan)
Atsushi Kurokawa (Hirosaki University, Japan)
Impact of Distributing 3D Stacked ICs on Maximum Temperature Reduction
SPEAKER: Kaoru Furumi

ABSTRACT. Thermal management becomes a critical challenge for 3D ICs. In this paper, we present an effective method for reducing temperatures by distributing the chips of a general 3D IC structure stacked in one place to multiple places. To evaluate the effects of distributing, a normal structure and a structure with thermal-SIB (thermal sidewalls, interchip plates, and a bottom plate) are used. We also present results of thermo-mechanical stress analysis. Experimental results demonstrate that by distributing chips from one place to multiple places in a 3D IC, temperature increases can be reduced dramatically, the structure with thermal-SIB can further reduce the temperature, and the thermo-mechanical stress can be also reduced.

16:00-16:50 Session I4: Invited Talk IV
Chair:
Kiyoharu Hamaguchi (Shimane University, Japan)
16:00
Jun Deguchi (Toshiba Memory Corp., Japan)
Time-Domain Neural Network for Deep Learning Inference
SPEAKER: Jun Deguchi

ABSTRACT. Demand for highly energy-efficient coprocessor for the inference computation of deep neural networks is increasing, because a lot of devices on the edge are moving toward implementing some forms of AI or deep learning that are performed locally due to concerns of privacy/security, and large latency/power consumption for communication with the cloud. In this talk, time-domain neural network (TDNN), which employs time-domain analog and digital mixed-signal processing (TDAMS) that uses delay time as the analog signal, is introduced. TDNN not only exploits energy-efficient analog computing, but also enables fully spatially unrolled architecture by virtue of the hardware efficient feature of TDAMS. The architecture reduces energy-hungry data moving of weight and activations from memory devices to processing elements, thus contributing to significant improvement of energy efficiency. Also, useful training techniques that mitigate the non-ideal effect of analog circuits are introduced, which enables to simplify the circuits and leads to maximizing the energy efficiency. The proof-of-concept chip shows unprecedentedly high energy efficiency of 48.2 TSop/s/W.