ARCS 2026: 39TH GI/ITG INTERNATIONAL CONFERENCE ON ARCHITECTURE OF COMPUTING SYSTEMS
PROGRAM FOR TUESDAY, MARCH 24TH
Days:
next day
all days

View: session overviewtalk overview

11:00-12:30Registration and Lunch
12:45-13:45 Session 2: Keynote #1
12:45
Ubitium: From RISC-V to AI Accelerators

ABSTRACT. Ubitium fuses reconfigurable computing and out-of-order processing into the CGRA-like Universal Processing Array (UPA). The OoO execution mode is enabled by a RISC-V frontend and uses the UPA as reservation station, execution units, and register file. Additionally, our UPA can seamlessly switch to a data-flow mode for AI and DSP acceleration. The low-overhead switch allows to reuse data between both modes. An FPGA prototype boots Linux, and a proof-of-concept 8nm test chip has been taped out.

13:45-14:00Break
14:00-15:35 Session 3: ARCS26 Main Track 1: RISC-V and processor architectures
14:00
Leszy: A Dynamically Reconfigurable Superscalar Out-of-Order RISC-V Processor

ABSTRACT. With the rising popularity of the RISC-V ISA and its growing open-source ecosystem, it has become an interesting environment for research in computer architecture. Utilizing and contributing to this ecosystem, we developed Leszy, an open-source, superscalar out-of-order RISC-V processor with dynamic partial reconfiguration capabilities. Combining a static CPU with dynamically reconfigurable accelerators in SoCs is a well-established concept, but dynamic partial reconfiguration of the CPU itself is less researched. Therefore, in addition to design-time configurations, the CPU presented in this paper also includes mechanisms for reconfiguring the type and number of execution units at runtime. This allows the core to adapt dynamically to the given compute load.

In this paper, we present the base architecture of Leszy, which comprises a superscalar out-of-order pipeline with explicit register renaming for speculative execution. Furthermore, the hardware components added to control dynamic reconfiguration and the required extensions to the execution units are outlined. The reconfiguration can be triggered by RISC-V hint instructions, allowing a software developer to incorporate reconfiguration triggers directly into the software. An extension to the Clang compiler was developed that supports pragmas for loops in C++ code. In combination with a list of available hardware configurations and their properties, this enables the compiler to efficiently schedule the reconfigurations to minimize application runtime. Furthermore, the compiler can add additional instructions that enable the hardware to make reconfiguration decisions at runtime.

14:20
Prototyping RISC-V 48-bit Instructions for Improved Code Density

ABSTRACT. RISC-V evolves rapidly, particularly in the embedded domain, driving innovation and redefining possibilities for open-source architectures. To address challenges in code density and energy efficiency in embedded systems, this work presents a prototype implementation of a general purpose 48-bit extension consisting of Load/Store Multiple and Load Long Immediate instructions. The extension augments the base RISC-V 32-bit instruction set while maintaining compatibility with common extensions, including IMAFC. Specification is done at the instruction set level, with implementation across the LLVM toolchain, including the clang compiler, llvm-objdump disassembler, Spike instruc- tion set simulator, and the CV32E40X processor. Evaluation demonstrates that, in selected memory-intensive functions from musl libc, the extension achieves an average dynamic size reduction by a factor of 9.6 and a decrease in executed instruction count by 10.9. Embench IOT 2.0 benchmarking results indicate an average 2.3% improvement in dynamic size and a 3.1% reduction in dynamic instruction count. Applicability to computationally intensive embedded use cases is demonstrated in the example of Video Capsule Endoscopy. This extension highlights RISC-V’s modularity and extensibility, aligning with the needs of embedded workloads that prioritize reduced static and dynamic memory footprints.

14:40
Benchmarking Highway C++ on RISC-V: Efficiency and Limitations of Portable SIMD Wrappers

ABSTRACT. This paper evaluates the performance of the portable SIMD library Highway-C++ on the RISC-V Vector extension (RVV). As RISC-V adoption grows, the need for portable, high-performance code becomes critical. We implement two representative algorithms, a Matrix Multiplication (compute-bound) and Base64 encoding/decoding (memory/shuffle-bound), using both Highway and native RVV intrinsics. Our analysis reveals that Highway achieves near-native performance for regular patterns like Matrix Multiplication. However, for complex shuffle operations in Base64, we identify significant performance deviations at higher Register Grouping factors ($LMUL \ge 4$). We demonstrate that Highway’s strict register allocation and abstraction overhead lead to increased register pressure and spilling in these scenarios. The study concludes that while Highway is excellent for portability, manual RVV optimization is still required for maximizing utility of large vector groups.

15:00
XVI-V: A Scalable and Programmable RISC-V based Multi-Cluster Accelerator for Embedded Systems

ABSTRACT. XVI-V (``sixteen-five'') is a scalable and programmable multiple core accelerator, featuring clusters of $4\times4$ RISC-V ISA Processing Element (PE) arrays, designed for enhanced hardware scalability. Unlike CGRAs or a single cluster multicore that face bottlenecks in memory and interconnects as PEs increase, the clustered architecture employed in XVI-V ensures linear scaling of hardware utilization and a stable operating frequency even with larger array sizes. Compared to a single cluster implemented on a Field-Programmable Gate Array (FPGA), XVI-V reduces the use of look-up tables by 19.31\% for 32 PEs and 41.29\% for 64 PEs, all while maintaining a consistent 80~MHz operating frequency, a key advantage over flattening CGRAs whose frequency degrades with more PEs. XVI-V achieves average speedups of $51\times$ (cycle) and $3.34\times$ (latency) over an ARM Cortex-A53 operating at 1.2~GHz.

15:20
Research Group Forum: LIBERO: A Flexible, Lightweight GDB-based Visualization Tool for RISC-V Vector Extensions

ABSTRACT. The RISC-V Vector (RVV) extension introduces powerful yet complex semantics for data-parallel execution, including dynamically sized vectors and configurable element layouts. While these features offer high performance and portability, they also complicate debugging, as existing tools, such as GDB, do not present RVV registers in a configuration-aware manner. Consequently, raw and verbose register dumps must be manually interpreted relative to the current vector register state. This paper presents LIBERO, a lightweight visualization tool integrated directly into GDB through its Python API. LIBERO augments GDB’s Text User Interface (TUI) with a custom register view that dynamically renders vector contents and configuration state during program execution while preserving the familiar GDB workflow.

15:35-16:00Coffee Break
16:00-17:35 Session 4: Organic Computing 1: Foundations
16:00
ORCA-Next: A Graph-Based Framework for Controlled Runtime Evolution in Organic Computing
PRESENTER: Marco Hüller

ABSTRACT. Modern technical systems must not only act adaptively in increasingly dynamic environments, but also meet stringent requirements for security, traceability, and operational reliability. While Organic Computing (OC) and the Multi-Layered Observer/Controller (MLOC) architecture offer an established conceptual framework for such systems, in practice this often remains at a purely conceptual level. The lack of explicit, well-defined execution semantics leads to inconsistent implementations, making it difficult to compare, audit, and securely integrate evolutionary mechanisms during operation.

In this paper, we present ORCA-Next, a graph-based runtime framework that operationalizes the theoretical roles of OC. Instead of assuming rigid layer hierarchies, ORCA-Next transforms observation, decision, security, and evolution into loosely coupled modules within a data-driven execution graph. An invariant set of core components coordinates initialization, scheduling, message passing, and monitoring, making intervention paths and execution conditions deterministic and transparent.

We describe the formal runtime model, the various execution modes, and the mapping to established OC principles. An illustrative walkthrough based on adaptive control demonstrates the deterministic message flow and shows how ORCA-Next combines explicit security interventions and complete traceability with a high degree of modular extensibility.

16:20
Evolving Observers for Agnostic Pattern Discovery in Spatiotemporal Fields

ABSTRACT. Simulation and analysis of large-scale self-organising systems are often bottlenecked by the difficulty of identifying relevant emergent entities within raw, high-dimensional state data. Motivated by the Organic Computing paradigm and its observer/controller architecture, we present a formal framework for \emph{agnostic pattern discovery} driven by the evolution of observer parameterisations. Rather than training classifiers on labelled data, we deploy a population of observer agents that evolve to identify periodic structure without prior knowledge of what patterns exist.

The framework introduces three key mechanisms. First, a \emph{dimension partition} formalism specifies which axes of the observation window constitute the traces to be analysed for periodicity, enabling a unified treatment of temporal, spatial, and mixed spatiotemporal patterns. Second, a deterministic \emph{entity selection} scheme encoded in each observer's genotype controls which traces are analysed, preventing post-hoc cherry-picking of favourable signals. Third, a \emph{two-phase fitness mechanism} combines spectral discovery with predictive validation: observers must not only detect spectral peaks but also predict future field states, ensuring that high fitness corresponds to genuine periodic structure rather than statistical artifacts.

We validate the framework on cellular automata (CA) spanning diverse dynamical regimes, with ground truth established through both controlled initialisation and an independent graph-based pattern classification algorithm. The evolved observers successfully detect well-documented patterns---blinkers, beacons, toads, and emergent oscillators---in Conway's Game of Life and generalise across eight distinct CA rules without parameter modification. Noise environments and aperiodic dynamics correctly drive populations to extinction, confirming that the approach rejects both random fluctuations and genuinely aperiodic behaviour. The framework provides a principled, mathematically grounded approach to deriving interpretable abstractions from unknown dynamical systems.

16:40
Awareness Engine: A Cognitive Middleware for Explicit Self-Awareness between Agent and Environment
PRESENTER: Zhixin Huang

ABSTRACT. Modern autonomous systems operate in complex and dynamic environments, where environmental information is multimodal, temporally heterogeneous in terms of validity durations, and often exceeds the processing capacity of individual agents. Although architectural models such as MAPE-K provide a system-level abstraction, self-awareness in many existing systems is still realized implicitly within agents, which limits interpretability and effective knowledge reuse in real-world settings.

This vision paper proposes an Awareness Engine as an explicit realization of system self-awareness. The Awareness Engine is designed as an independent cognitive middleware between the environment and intelligent agents, explicitly supporting the Monitor--Analyze--Knowledge functions of the MAPE-K loop. It organizes heterogeneous environmental information into structured, queryable, and interpretable knowledge, enabling task- and role-aware cognition without requiring agents to directly percept raw environmental information.

The proposed architecture comprises a Knowledge Layer for multimodal knowledge management and a Cognition Layer for task-driven querying interface, derivation, and state monitoring. Representative realization paths based on existing techniques are discussed to demonstrate engineering feasibility. In addition, two application scenarios, namely traffic management and server cluster management, are presented to illustrate potential integration into real-world systems.

17:00
ADNA-based Organic Computing for Fail-Safe Systems Providing High Dependability: Formal Verification of ADNA Building Blocks

ABSTRACT. This work explores the formal verification of fundamental Basic Building Blocks (BBBs) for Artificial DNA–based Organic Computing (ADNA-based OC) systems. The overarching goal is to contribute to the development of highly reliable and fail-safe embedded systems, particularly for safety-critical domains such as aviation, autonomous driving, and medical devices. The ADNA concept, inspired by biological systems, enables self-organizing, fault-tolerant, and adaptive systems by integrating Self-X properties, including self-healing and self-optimization. Recent research projects indicate that ADNA-based OC approaches could become part of future automotive system architectures. Given the safety-critical nature of this application domain, rigorous verification is a fundamental requirement to ensure correct implementation and compliance with specified system requirements.

In the automotive sector in particular, the certification of safety-critical components is mandatory to guarantee functional safety and regulatory compliance. As a first step toward this goal, this work presents the formal verification of a set of commonly used fundamental BBBs. Verification is conducted using the model-checking tool UPPAAL to ensure logical correctness, compliance with timing constraints, and deadlock-free behavior. On the one hand, the verification process reveals that some BBBs exhibit edge cases that could lead to faulty behavior; these findings directly resulted in initial improvements to the reliability of the ADNA components. On the other hand, a key insight of this work is that UPPAAL is not sufficiently expressive to fully certify all BBBs. Overall, this work lays important groundwork for certifiable, fail-safe ADNA designs with potential real-world applications.

17:20
Research Group Forum: Trust as a First-Class Citizen: A Research Agenda for Trustworthy Organic Computing Systems

ABSTRACT. Computational trust has long been recognised as a fundamental building block of Organic Computing systems, as it directly influences the ability of autonomous subsystems to cooperate, negotiate, and self-organise in open, heterogeneous environments. Existing work has identified several distinct facets of trust -– including reliability, credibility, and functional correctness -– to capture its multifaceted nature in technical systems. However, in most practical approaches, only selected facets, most prominently reliability and correctness, are explicitly modelled and exploited, while the remaining dimensions are often ignored or treated only implicitly. This selective treatment leads to an incomplete and potentially misleading assessment of trust, which can impair decision-making in complex, interwoven systems where interactions are shaped by more than just technical correctness or throughput. In this article, we argue that trust must be conceptualised and operationalised as an integrated, multi-faceted construct, and we propose a comprehensive framework that jointly considers all identified facets. We illustrate the desired characteristics in two heterogeneous case studies and explain the integrated view of computational trust towards more robust, transparent, and context-sensitive trust assessments.