View: session overviewtalk overview
09:00 | Accelerating WebAssembly Interpreters in Embedded Systems through Hardware-Assisted Dispatching ABSTRACT. WebAssembly is a promising bytecode virtualization technology for embedded systems. WebAssembly interpreters for embedded demonstrate strong isolation and portability. However, they come with a significant performance penalty compared to direct bare-metal execution or compiled WebAssembly code. This creates demand for interpreter optimization. In this work, we present an approach that increases the execution speed of interpreted WASM code by offloading computed GOTOs, as used in interpreters, to a hardware accelerator. We describe the accelerator’s hardware design, as well as the integration in the popular WebAssembly Micro Runtime (WAMR). To prove the functionality and effectiveness of our approach, we integrate the accelerator into an open-source RISC-V processor core and evaluate WASM interpretation using different benchmarks. Our results show that the approach significantly increases execution speed while only creating minimal code size overhead. Finally, we give an outlook on possible future improvements of the accelerator. |
09:25 | Exploring the ARM Core Mesh Network Topology PRESENTER: Philipp A. Friese ABSTRACT. The continuously rising number of cores per socket puts a growing demand on on-chip interconnects. The topology of these interconnects are largely kept transparent to the user, yet, they can be the source of measurable performance differences for large many-core processors due to core placement on that interconnect. This paper investigates the ARM Core Mesh Network (CMN) on an Ampere Altra Max processor. We provide novel insights into the interconnect by experimentally deriving key information on the CMN topology, such as the position of cores or memory and cache controllers. Based on this insight, we evaluate the performance characteristics of several benchmarks and tune the thread-to-core mapping to improve application performance. Our methodology is directly applicable to all ARM-based processors using the ARM CMN, but in principle applies to all mesh-based on-chip networks. |
09:50 | Comparison of a Binary Signed-Digit Adder with conventional Binary Adder Circuits on Layout Level ABSTRACT. Binary signed-digit (BSD) adders offer the ability to do addi- tion in constant time, independent of the operand width. This is done by allowing each digit to take three different values (one more then in stan- dard binary encoding), storing an arising carry bit at one position in the subsequent digit. Subsequently, this prevents carry propagation through- out the entire circuit. Previous work has evaluated one type of BSD adder encoding (BSD-SUM) regarding speedup to conventional adder architec- tures on logic synthesis level. This paper builds upon this work, extending it onto layout level for a more detailed view on area and power consumption overhead. We compare the BSD-SUM adder with three con- ventional binary adder architectures, namely Carry-Lookahead, Ripple- Carry circuits and the resulting circuitry of a + operator in HDL code and evaluate multiple typical operand widths. We found that the BSD- SUM adder architecture offers a area benefit over the Carry-Lookahead architecture for wide operand widths (64 bit and greater). We also found that the power consumption overhead of this architecture is quadratic when compared to binary adder architectures. |