ARCS 2024: 37ND GI/ITG INTERNATIONAL CONFERENCE ON ARCHITECTURE OF COMPUTING SYSTEMS 2024
PROGRAM FOR THURSDAY, MAY 16TH
Days:
previous day
all days

View: session overviewtalk overview

09:00-10:15 Session 11: Computer Architecture Co-Design II
Location: Room 3.06.H01
09:00
Accelerating WebAssembly Interpreters in Embedded Systems through Hardware-Assisted Dispatching

ABSTRACT. WebAssembly is a promising bytecode virtualization technology for embedded systems. WebAssembly interpreters for embedded demonstrate strong isolation and portability. However, they come with a significant performance penalty compared to direct bare-metal execution or compiled WebAssembly code. This creates demand for interpreter optimization. In this work, we present an approach that increases the execution speed of interpreted WASM code by offloading computed GOTOs, as used in interpreters, to a hardware accelerator. We describe the accelerator’s hardware design, as well as the integration in the popular WebAssembly Micro Runtime (WAMR). To prove the functionality and effectiveness of our approach, we integrate the accelerator into an open-source RISC-V processor core and evaluate WASM interpretation using different benchmarks. Our results show that the approach significantly increases execution speed while only creating minimal code size overhead. Finally, we give an outlook on possible future improvements of the accelerator.

09:25
Exploring the ARM Core Mesh Network Topology

ABSTRACT. The continuously rising number of cores per socket puts a growing demand on on-chip interconnects. The topology of these interconnects are largely kept transparent to the user, yet, they can be the source of measurable performance differences for large many-core processors due to core placement on that interconnect. This paper investigates the ARM Core Mesh Network (CMN) on an Ampere Altra Max processor. We provide novel insights into the interconnect by experimentally deriving key information on the CMN topology, such as the position of cores or memory and cache controllers. Based on this insight, we evaluate the performance characteristics of several benchmarks and tune the thread-to-core mapping to improve application performance. Our methodology is directly applicable to all ARM-based processors using the ARM CMN, but in principle applies to all mesh-based on-chip networks.

09:50
Comparison of a Binary Signed-Digit Adder with conventional Binary Adder Circuits on Layout Level

ABSTRACT. Binary signed-digit (BSD) adders offer the ability to do addi- tion in constant time, independent of the operand width. This is done by allowing each digit to take three different values (one more then in stan- dard binary encoding), storing an arising carry bit at one position in the subsequent digit. Subsequently, this prevents carry propagation through- out the entire circuit. Previous work has evaluated one type of BSD adder encoding (BSD-SUM) regarding speedup to conventional adder architec- tures on logic synthesis level. This paper builds upon this work, extending it onto layout level for a more detailed view on area and power consumption overhead. We compare the BSD-SUM adder with three con- ventional binary adder architectures, namely Carry-Lookahead, Ripple- Carry circuits and the resulting circuitry of a + operator in HDL code and evaluate multiple typical operand widths. We found that the BSD- SUM adder architecture offers a area benefit over the Carry-Lookahead architecture for wide operand widths (64 bit and greater). We also found that the power consumption overhead of this architecture is quadratic when compared to binary adder architectures.

10:15-10:45Coffee Break
10:45-11:45 Session 12: Workshop on Dependability and Fault Tolerance
Location: Room 3.06.H01
10:45
Software-based Erasure-tolerant Coding with Buffering and Compression

ABSTRACT. Erasure-tolerant coding is applied in data storage systems, and also for data transmission to tolerate the loss of data parts due to failures. In the paper we focus on a particular coding variant that we call compressed low-rate codes. We report on properties of the codes, draw requirements for an implementation and report on the compression results that we reached with our software-based system. A particular result is the necessity of a buffer management for data prior encoding and for the data that is produced in the encoded data streams. Another result is the experimental evaluation of the storage overhead. A predicted overhead reduction by compression could be observed in real-data experiments, nonetheless the lower bound of reduction by compression was not reached.

11:10
Determination of Optimal H-Matrices for 2-Bit Error Correcting Codes

ABSTRACT. 2-bit error correction for random errors in memory cells is of growing importance. Instead of the 1-bit error correcting and 2-bit error detecting Hsiao-code a 2-bit error correcting BCH-code can be used with the disadvantage that for the maximal code length of $2^m - 1$ the number of necessary check bits is $2 \cdot m$. Compared to the needed $m$ check bits for the commonly implemented Hsiao-code the number of check bits is doubled and the necessary overhead for error correction is relatively high. To reduce this overhead it is of interest to determine 2-bit error correcting codes with maximal length for a given number of check bits.

In this paper it is shown for the first time that the code length of a 2-bit error correcting BCH-code can not be enlarged by adding a further column to its H-matrix. (The proof is based on the fact that a 2-bit error correcting BCH code is quasi-perfect.) A similar result is also true for a 3-bit error correcting and 3-bit error detecting BCH-code with included parity.

For up to 8 check bits H-matrices for codes with maximal code length are determined. For larger numbers of check bits H-matrices with almost optimal code length are determined by a new algorithm of computer search, based on detailed properties of the columns of the corresponding H-matrices in there separated form.