ARCS2023: 36TH GI/ITG INTERNATIONAL CONFERENCE ON ARCHITECTURE OF COMPUTING SYSTEMS
PROGRAM FOR TUESDAY, JUNE 13TH
Days:
next day
all days

View: session overviewtalk overview

12:15-13:15 Session 1: Keynote #1

Speaker: Dionisios Pnevmatikatos

Title: Reconfigurable technologies in HPC and data-centers. Challenges and opportunities

Abstract: Reconfigurable technology has been successfully showcased in several computationally intensive applications that exploit the underlying adaptability to extract performance. When applying this technology to more general environments (HPC and/or data-centers), the necessary tradeoffs and performance tuning are more challenging. In this talk I will describe our the current state of play in the field and our activities and progress towards the two settings. For the HPC environment we are building a set of open source libraries for typical HPC kernels, starting from basic ones (BLAS L1), and gradually moving towards the more involved, interesting and difficult ones: (e.g. BLAS L2 & L2, SpMv, Jacobi, LU decomposition). For deploying accelerators at the data center we are building a substrate able to flexibly support mutli-tenancy while ensuring isolation of the data accessed by hte accelerators. Scalabiity in the number of accelerators and support of high-performance memory systems (with multiple channels) is supported via a properly dimensioned NoC.

Bio: Dionisios Pnevmatikatos (BSc U. Crete ’89, PhD UW-Madison 95) is Professor at the School of ECE of the National Technical University of Athens and a member of the Computing Systems Lab (CSLab). Before that he was a Professor of ECE at the Technical University of Crete (2000-2019) and a research associate with FORTH-ICS (1997-2019).His research interests are in the broader area of Computer Architecture, where he investigates the design and implementation of high-performance and cost-effective accelerated, heterogeneous parallel/rack-scale systems.He is the coordinator of the AERO ongoing project and he has participated and coordinated many EU-funded and several national-funded projects in the past. He has also been in the Program Committee and served as a Program (co)Chair in many prestigious conferences in his field.

13:45-15:00 Session 2: Accelerating Neural Networks
13:45
Energy Efficient LSTM Accelerators for Embedded FPGAs through Parameterised Architecture Design

ABSTRACT. Long Short-term Memory Networks (LSTMs) are a vital Deep Learning technique suitable for performing on-device time series analysis on local sensor data streams of embedded devices. In this paper, we propose a new hardware accelerator design for LSTMs specially optimised for resource-scarce embedded Field Programmable Gate Arrays (FPGAs). Our design improves the execution speed and reduces energy consumption compared to related work. Moreover, it can be adapted to different situations using a number of optimisation parameters, such as the usage of DSPs or the implementation of activation functions. We present our key design decisions and evaluate the performance. Our accelerator achieves an energy efficiency of 11.89 GOP/s/W during a real-time inference with 32873 samples/s.

14:10
Overcoming the ReRAM scalibility challenges to optimize memory processing in NNs

ABSTRACT. As the massive usage of Artificial Intelligence (AI) techniques spreads in the economy, reducing the energy of Neural Network (NN) applications is still a major problem, mainly because the size of NNs has been steadily increasing. Analog processing techniques have been shown to increase performance and reduce energy, but their main problem is scalability, which is required for actual NN applications. Resistive RAM (ReRAM) devices can compute Matrix-Vector Multiplication (MVM) in the analog domain within O(1) time complexity, but suffer from scalability issues. In this paper, we present techniques to effectively allow the use of ReRAM in a scalable fashion, hence reducing energy in complex NNs. Experiments on real-world Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) models can reduce the model size up to 64x and reduce the energy consumption by a factor of 6x, maintaining the inference accuracy of the model.

14:35
A Comparative Study of Neural Network Compilers on ARMv8 Architecture

ABSTRACT. The deployment of Deep Neural Network (DNN) models in far edge devices is a challenging task, because these devices are characterized by scarce resources. To address these challenges various deep learning toolkits and model compression techniques have been developed both from industry and academia. The available DNN toolchains can perform optimizations at different levels e.g., graph level, Intermediate Representation (IR) or machine-dependent optimizations, while they operate in an Ahead-of-Time (AOT) or Just-in-Time (JIT) manner. Although the area of DNN toolchains is an active research area, there is no available study that analyses the performance benefits achieved by the different optimization levels e.g., the performance boost reported by the graph-level vs. the machine-dependent optimizations. This work performs a comprehensive study of three popular neural network (NN) compiler frameworks that target (mainly) far edge devices: TensorFlow Lite for MCUs, Glow, and IREE. For a fair comparison, our performance analysis targets to reveal the performance benefits offered by the different optimization levels for the three studied frameworks as well as the strength of specific graph-level optimizations e.g., in quantizing the input NN models. Our evaluation is based on various NN models with different computational/memory resources and the experiments are performed in a state-of-the-art high-performance embedded platform by Nvidia.

15:30-16:45 Session 3: Organic Computing Methodology (OC1)
15:30
A Decision-Theoretic Approach for Prioritzing Maintenance Activities in Organic Computing Systems

ABSTRACT. Organic Computing systems intended to solve real-world problems are usually equipped with various kinds of sensors and actua- tors in order to be able to interact with their surrounding environment. As any kind of physical hardware component, such sensors and actua- tors will fail after a usually unknown amount of time. Besides the obvious task of identifying or predicting hardware failures, an Organic Comput- ing system will furthermore be responsible to assess if it is still able to function after a component breaks, as well as to plan maintenance or repair actions, which will most likely involve human repair workers. Within this work, three different approaches on how to prioritize such maintenance actions within the scope of an Organic Computing system are presented and evaluated.

15:55
Predicting Physical Disturbances in Organic Computing Systems using Automated Machine Learning

ABSTRACT. Robustness against internal or external disturbances is a key competence of Organic Computing Systems. Hereby, a rarely discussed aspect are physical disturbances, therefore, failures or breakdowns that affect a systems physical components. Before experiencing such a disturbance, physical components may show various measurable signs of deterioration that might be assessed through sensor data. If interpreted correctly, it would be possible to predict future physical disturbances and act appropriately in order to prevent them from possibly harming the overall system. As the actual structure of such data as well as the behaviour that disturbances produce might not be known a priori, it is of interest to equip Organic Computing Systems with the ability to learn to predict them autonomously. We utilize the Automated Machine Learning Framework TPOT for an online-learning-inspired methodology for learning to predict physical disturbances in an iterative manner. We evaluate our approach using a freely available dataset from the broader domain of Predictive Maintenance research and show that our approach is able to build predictors with reasonable prediction quality autonomously.

16:20
Self-Adaptive Diagnosis and Reconfigurationin ADNA-Based Organic Computing
PRESENTER: Simon Meckel

ABSTRACT. The increasing openness and dynamism in embedded systems necessitate the continuous advancement of diagnostic methodologies, particularly in contexts where safety is paramount and system operability must persist despite faults or failures. The implementation of Organic Computing offers substantial benefits to these intricate, dynamic systems, such as decreased development effort, enhanced adaptability, and resilience. Nonetheless, safety-critical systems that must preserve functionality amid failure by maintaining a fail-operational status require additional characteristics. This paper presents approaches such as adaptive diagnostics employing neural networks for fault detection and localization, adaptive probing for fault identification, and strategies for degraded performance states and system reconfiguration to circumvent complete service disruption when computational resources are insufficient.