DSD2021: EUROMICRO CONFERENCE ON DIGITAL SYSTEMS DESIGN 2021
PROGRAM FOR FRIDAY, SEPTEMBER 3RD
Days:
previous day
all days

View: session overviewtalk overview

09:00-10:00 Session 15: KEYNOTE 5 DSD

Keynote 5 - Dr. Danilo Pietro Pau - Technical director, IEEE and ST Fellow, System Research and Applications, STMicroelectronics (Italy) - Characterizing deeply quantized neural networks for in-sensor computing

10:00-11:30 Session 16A: APPLICATIONS
10:00
High Speed Implementation of the Deformable Shape Tracking Face Alignment Algorithm

ABSTRACT. The 2D facial landmark alignment method, implemented in C++ in the open source libraries DLIB and Deformable Shape Tracking (DEST), is used in several applications such as driver drowsiness detection, recognition of facial expressions, etc. The most challenging of these applications require fast processing of video frames. Therefore, the alignment of the facial landmarks in a single video frame has to be performed with the minimum possible latency without precision loss. In this paper, the DEST implementation of the face alignment method that is based on regression trees is heavily restructured to reduce latency. The resulting face alignment predictor is implemented in C. The elimination of multiple nested routine calls, excessive argument copying, type conversions and integrity checks lead to a software implementation that is 240 times faster than the one provided in the DEST library. Moreover, the structure of the new face alignment predictor is appropriate for hardware implementation on a Field Programmable Gate Array (FPGA) for further acceleration.

10:20
Highly Parallel Sample Rate Converter for Space Telemetry Transmitters

ABSTRACT. In recent years, following the rapid innovation guidelines of most space agencies, there have been major advances in satellite transmitter technologies. In particular, standards with very high performances and flexibility have been introduced (e.g. DVB-S2, DVB-S2X and CCSDS 131.2-B-1) to maximize efficiency and throughput. Moreover, the gradual use of higher transmission bands, started with the S-Band, moved to the X-band and now widely employing also Ku-Band/Ka-Band, are increasing the systems usable bandwidths. For all those reasons system integrators are pushing to develop architecture that may also dynamically change the payload symbol rate, thus the bandwidth, to cover different target scenarios. Considering those wide bands and the fact that DAC clocks must be fixed to minimize jitter, the dynamical symbol rate may be achieved through the use of fractional sample rate interpolators. In this paper the architecture of a massively parallel sample rate converter, with no backpressure availability on the modulator block, is presented. The analysis takes into account the problem of quantization effect in asynchronous sample rate converters, proposing a novel architecture to address this specific case. Implementation considerations and results are taken for the new Xilinx Space-Grade Kintex Ultrascale XQRKU-060 considering an SRRC IQ modulated signal with the DVB-S2 standard.

10:40
Single-Frame Direct Reflectance Estimation With Indirect Time-of-Flight Cameras

ABSTRACT. Computer vision algorithms are influenced by variations in lighting conditions. Images, independent of lightning conditions, have the potential to improve the training of machine learning algorithms, especially in tasks like material and object classification including face recognition. For cameras with an active illumination source, such as Time-of-Flight~(ToF) cameras, the largest variation is caused by the distance between the camera and the object. ToF cameras provide dense 3D point clouds. In addition, they are capable of measuring infrared~(IR) images. As ToF pixels are often larger than IR camera pixels, they are rarely considered for 2D only imaging. However, the additional features of a ToF camera, can be used to realize methods to record distance normalised IR images. In this paper, we explore methods to extract direct reflectance estimates from ToF measurements. We propose two novel methods relying on coded modulation~(CM) and compare it to a method that can convert data from the state-of-the-art continuous wave~(CW) measurement method. With the invention of the CM based methods we are able to realize the normalization in a singe frame measurement, compared to the four frames recorded by the CW method. All three methods are evaluated based on simulation results and in-laboratory measurements. We are able to demonstrate that our novel methods, relying on CM, can achieve the desired measurement behaviour.

11:00
Evaluation of Time Series Clustering in Embedded Sensor Platform

ABSTRACT. Clustering is one of the major problems in studying the time series data, while solving this problem on the embedded platform is almost absent because of the limitation of computational resources on the edge. In this paper, two typical clustering algorithms, the K-means and the Self-Organizing Maps (SOM), together with Euclidean distance measurement and dynamic time warping (DTW) are studied to verify their feasibility in an embedded sensor platform. For the given datasets, the models are trained on a computer and moved to an ESP32 microprocessor for inference. It is found that the SOM achieves similar accuracy compared with K-means, while its inference process takes a longer time. The experiment results show that a sample with 300 data points can be clustered among 12 clusters within 40 ms by SOM with the DTW model, while the fastest model can run at around 2 ms using K-means with Euclidean distance model. In other words, it can consume the data from 40 sensors in 680 ms, which can be scheduled with the real-time data acquisition and transmission tasks. The performance gathered supports that it is feasible to deploy the time series clustering model in the embedded sensor platform.

10:00-11:30 Session 16B: FRAMEWORKS
10:00
A Deployment Framework for Quality-Sensitive Applications in Resource-Constrained Dynamic Environments

ABSTRACT. Traditional embedded systems and recent platforms used in emerging computing paradigms (e.g., fog computing) have resource limits and require their applications and services to be dynamically added (i.e., deployed) and removed at run-time. These applications often have non-functional (quality) requirements (e.g., end-to-end latency) which are only satisfied when sufficient resources are allocated to them. Hence, a run-time decision-maker is needed to optimize the deployments, in terms of resource budgets that are allocated to applications. Additionally, computing platforms have become heterogeneous in terms of their resources and the applications they execute. However, the existing deployment solutions are limited to specific resources and services. In this paper, we propose a run-time deployment framework that is more flexible in defining constraints and optimization goals and works with more heterogeneous resources and resource models than existing solutions. The framework is implemented on an embedded platform as a proof of concept.

10:30
ParalOS: A Scheduling & Memory Management Framework for Heterogeneous VPUs

ABSTRACT. Embedded systems are presented today with the challenge of a very rapidly evolving application diversity followed by increased programming and computational complexity. Customised heterogeneous System-on-Chip (SoC) processors emerge as an attractive HW solution in various application domains, however, they still require sophisticated SW development to provide efficient implementations at the expense of slower adaptation to algorithmic changes. In this context, the current paper proposes a framework for accelerating the SW development of computationally intensive applications on Vision Processing Units (VPUs), while still enabling the exploitation of their full HW potential via low-level kernel optimisations. Our framework is tailored for heterogeneous architectures and integrates a dynamic task scheduler, a novel scratchpad memory management scheme, I/O & inter-process communication techniques, as well as a visual profiler. We evaluate our work on the Intel Movidius Myriad VPUs using synthetic benchmarks and real-world applications, which vary from Convolutional Neural Networks (CNNs) to computer vision algorithms. In terms of execution time, our results range from a limited ∼8% performance overhead vs optimised CNN programs to 4.2 x performance gain in content-dependent applications. We achieve up to 33% decrease in scratchpad memory usage vs well-established memory allocators and up to 6x smaller inter-process communication time.

11:00
Scheduling Persistent and Fully Cooperative Instructions

ABSTRACT. Parallel, distributed two-level control system has been adopted in streaming application accelerators that implement atomic vector operations. Each instruction of such architecture deals with one aspect (arithmetic, interconnect, storage, etc.) of an atomic vector operation. Such instructions are persistent and fully cooperative. Their lifetimes vary because of the vector size and the degree of parallelism. More complex constraints are also required to express the cooperation among these instructions. The conventional instruction behavior models are no longer suitable for such instructions. Therefore, we develop a novel instruction behavior model to address the scheduling aspect of the instruction set required by such architecture. Based on the behavior model, we formally defines the scheduling problem and formulate it as a constraint satisfaction optimization problem (CSOP). However, the naive CSOP formulation quickly becomes unscalable. Thus a heuristic enhanced scheduling algorithm is introduced to make the CSOP approach scalable. The enhanced algorithm's scalability is validated by a large set of experiments varying in problem size.

11:30-13:00 Session 17A: HW-SW CODESIGN AND RECONFIGURABILITY
11:30
An Investigation of Dynamic Partial Reconfiguration Offloading in Hard Real-Time Systems

ABSTRACT. Nowadays, complex Cyber-Physical Systems (CPSs) often exploit the so-called computing at the edge (i.e., edge-computing), where the Dynamic Partial Reconfiguration (DPR, also known as Dynamic Function eXchange or Partial Reconfiguration) feature has been proved to be efficient to face the adaptivity challenges typical of the CPSs domain. In this context, the increase of both platforms heterogeneity and required customizations are leading to a growth of the number of per-task requested DPR, for which Industry is enhancing the reconfiguration controllers providing the capability to offload more than one DPR request, in turn allowing pipelining between multiple DPR processes and application execution. Several works in literature have introduced the DPR process in the hard real-time system domain, however not considering the multiple DPR offloading capabilities. In this paper, we provide a theoretical analysis and a practical evaluation of multiple DPR offloading in the context of hard real-time systems. In particular, through a motivational case-study, supported by some experimental activities conducted on the Zynq-7000 SoC, we show how the offload of multiple DPR can provide benefits with respect to the traditional approach of one DPR request per time.

12:00
A Hardware/Software Concept for Partial Logic Updates of Embedded Soft Processors at Runtime

ABSTRACT. Embedded systems are built from various hardware components and execute software on one or more microcontroller units (MCU). These MCUs usually contain a fixed integrated circuit, thus disallowing modifications to their logic at runtime. While this keeps the instruction set architecture (ISA) fixed as well, it leaves the software as the only flexible part in the system. But what if the MCU logic could be easily changed at runtime in order to fix bugs or if the ISA could be extended on-the-fly in order to introduce application-specific instructions and features on demand? This work demonstrates a concept for introducing more hardware flexibility through application-specific MCU modifications. Therefore, the MCU is implemented as a soft core on a field-programmable gate array (FPGA) and we reconfigure its logic with support of the operating system (OS) running on it. The reconfiguration happens on-the-fly, so no interruption of the application code or even a system restart is required. Therefore, (i) the MCU pipeline is specially designed for extensibility by new instructions, and (ii) the FPGA is selected to support partial self-reconfiguration of its logic cells at runtime. As long as an instruction is not yet part of the ISA, the OS supports its emulation to provide a consistent interface for applications. Apart, no special compiler support is required, but the application must provide either the emulation code or a hardware description for adding the required logic. For a proof of concept, we use a RISC-V based MCU on a Xilinx Artix-7 FPGA and for evaluating the general benefit of our approach we use an algorithm that is costly when executed with the original ISA but fast with application-specific instructions added at runtime. The experimental evaluation also shows that the on-the-fly hardware update does not disrupt or compromise the software execution flow.

12:30
Metrics for the Evaluation of Approximate Sequential Streaming Circuits

ABSTRACT. The design of energy- and area-efficient systems is important for modern technology. One approach to increase these efficiencies is approximate computing. During the last years, efficient approximations for combinational hardware components, e.g., adders or multipliers, have been proposed.

We focus on quality metrics for the evaluation of approximations in sequential circuits with streaming in- and outputs. We propose the usage of sequence distance metrics for analysis of the sequential behavior after approximation and compare their performance to other metrics like mean errors and accumulated errors. We present case studies on some exemplary circuits. The experimental results show that our sequential metrics provide additional information to common mean errors and for stochastic applications yield the best guidance in selecting approximate sequential circuits.

11:30-13:00 Session 17B: MODELING AND SIMULATION 2
11:30
To Pin or Not to Pin: Asserting the Scalability of QEMU Parallel Implementation

ABSTRACT. Due to its speed in cross-executing sequential code, dynamic binary translation is the unchallenged technology for full system-level simulation. Among the translators, QEMU has become the de facto solution. It introduced parallel host execution of the target cores a few years ago for the ARM instruction set architecture and this support is now also available, among others, for RISC-V. Given the popularity of these instruction sets in multi and many-core systems, assessing the scalability of their parallel implementation makes sense. In this paper, we use a subset of the PARSEC benchmark to measure the execution time of QEMU’s parallel implementation, to which we added the ability to pin a target processor to a host core or hardware thread. We report the results of a wealth of experiments we performed on a 16-core/32-thread x86-64 SMP machine. They show that the support of parallelism in QEMU scales well, and that, somewhat counter intuitively, pinning does not improve performance.

12:00
Gain and Pain of a Reliable Delay Model

ABSTRACT. State-of-the-art digital circuit design tools almost exclusively rely on pure and inertial delay for timing simulations. While these provide reasonable estimations at very low execution time in the average case, their ability to cover complex signal traces is limited. Research has provided the dynamic Involution Delay Model (IDM) as a promising alternative, which was shown (i) to depict reality more closely and recently (ii) to be compatible with modern simulation suites. In this paper we complement these encouraging results by experimentally exploring the behavioral coverage for more advanced circuits.

In detail we apply the IDM to three simple circuits (a combinatorial loop, an SR latch and an adder), interpret the delivered results and evaluate the overhead in realistic settings. Comparisons to digital (inertial delay) and analog (SPICE) simulations reveal, that the IDM delivers very fine-grained results, which match analog simulations very closely. Moreover, severe shortcomings of inertial delay become apparent in our simulations, as it fails to depict a range of malicious behaviors. Overall the Involution Delay Model hence represents a viable upgrade to the available delay models in modern digital timing simulation tools.

12:20
Heterogeneous Communication Virtualisation for Distributed Embedded Applications

ABSTRACT. Distributed embedded applications (DEAs) are typically implemented on diverse embedded nodes interconnected through communication network(s) to exchange data and control information to achieve the desired functionality. Conventional approaches of utilising a single large-bandwidth link in a distributed system are not efficient in large DEAs owing to diverse requirements and factors like cost, reliability, scalability and criticality, among others. Heterogeneous communication is a promising approach in DEAs, where the diverse nature of underlying protocols (wired/wireless, synchronous/asynchronous, multiple access modes and others) can be leveraged to meet such requirements, in addition to the benefits like aggregated bandwidth and robustness. However, utilising them `directly' places significant complexity on the application as it needs to dynamically evaluate the channels and utilise different protocol structures for each case. Virtualising the communication channels would present a unified interface to the application by abstracting away low-level details, similar to virtualisation applied in compute architectures. However, unlike architecture virtualisation, virtualising heterogeneous communication particularly for resource-constrained device networks involves unique challenges imposed by the physical (wired/wireless) and logical domains (limited-bandwidth, small payload, protocols, channel access schemes, etc.), which needs to be concurrently evaluated to optimise the communication system. This paper presents a model and an optimal transmission strategy as the proof of concept for deploying heterogeneous communication in DEAs. The model is described at an abstracted level while capturing transmission parameters of multiple channels, which are then optimised to meet the application's communication requirements. The model and the optimisation method are validated through simulation and a practical case study.

12:50
NMPO: Near-Memory Computing Profiling and Offloading

ABSTRACT. Real-world applications are now processing big-data sets, often bottlenecked by the data movement between the compute units and the main memory. Near-memory computing (NMC), a modern data-centric computational paradigm, can alleviate these bottlenecks by leveraging improved performance. The lack of NMC system availability makes simulators the primary evaluation methodology for performance estimation. However, simulators are usually time-consuming, and methodologies that can reduce this overhead would help in the early-stage design of NMC systems. This work proposes Near-Memory computing Profiling and Offloading (NMPO), a high-level framework capable of predicting NMC offloading suitability employing an ensemble machine learning model. NMPO predicts NMC profitability with an accuracy of 85.6% and, compared to prior works, can reduce the prediction time by using hardware-dependent applications features by 2 to 3 order of magnitude.

15:30-16:30 Session 19A: Dependability, Testing and Fault Tolerance in Digital Systems 1
15:30
Automated Debugging-Aware Visualization Technique for SystemC HLS Designs

ABSTRACT. High-level Synthesis (HLS) using system-level modeling language SystemC at the Electronic System Level (ESL) is being increasingly adopted by the semiconductor industry to raise design productivity. However, errors in the high-level design can propagate down to the low-level implementation and become very costly to fix. Thus, SystemC HLS verification and debugging are necessary and important. While monitoring simulation behavior is a straightforward solution to debug a given design in the case of an error (results of verification), it can become a very time-consuming process as a large amount of data that is not necessarily relevant to the source of error is analyzed. In this paper, we propose a fast and automated debugging-aware visualization approach, enabling designers to monitor the portion of a given SystemC HLS design’s simulation behavior that is related to the erroneous output(s). Experimental results including an extensive set of standard SystemC HLS designs show the effectiveness of our approach in localizing the designs' simulation behavior in terms of the number of visualized variables. In comparison to traditional visualization methods, our proposed approach obtains up to 96% and 91% reduction in the search space for single and multiple faulty outputs, respectively.

16:00
Search Strategy of Large Nonlinear Block Codes

ABSTRACT. Test pattern compression techniques reduce the amount of transferred data while decompressed pattern flexibility remains high. The flexibility can be measured by the guaranteed number of specified test pattern bits, which has to be equal or greater than the number of care bits in the pattern. This paper addresses the problem of test pattern compression and decompression using nonlinear codes. Test patterns can be encoded using a linear code that leads to an XOR network that decompresses the compressed patterns transferred from a tester. This XOR network can be completed or replaced by a standard (nonlinear) combinational network with substantially improved compression efficiency. The problem of finding the code was formulated as a clique cover problem. This approach's tight throat lies in computational complexity and the needed memory capacity to temporarily retain intermediate results of clique coverage search attempts. The paper proposes a method that substantially reduces the computational time to get a robust nonlinear code with decompression ability. It shows that random search algorithms cannot reach the code parameters obtained by a restricted iterative search proposed in the paper. Additional knowledge concerning the check-bit truth table characteristics was used to avoid unsuccessful investigations. The proposed decompressor schemes have higher efficiency than till now known decompressor constructions. On the other hand, it is challenging to find other heuristics that provide efficient codes for a broader class of code parameters than the proposed solution.

15:30-16:30 Session 19B: Future Trends in Emerging Technologies (FTET)
15:30
Design for Restricted-Area and Fast Dilution using Programmable Microfluidic Device based Lab-on-a-Chip

ABSTRACT. Microfluidic lab-on-a-chip has emerged as a new technology for implementing biochemical protocols on small-sized portable devices targeting low-cost medical diagnostics. Among various efforts of fabrication of such chips, programmable microfluidic device (PMD) is a relatively new technology for implementation of flow-based lab-on-a-chips. A PMD chip is suitable for automation due to its symmetric nature. In order to implement a bioprotocol on such a reconfigurable device, it is crucial to automate sample preparation on a chip as well. Sample preparation, which is a front-end process to produce the desired target concentrations of the input reagent fluid, plays a pivotal role in every bioassay or bioprotocol. In this paper, first, a method referred as dilution algorithm in two steps ({\em DATS}) is discussed, which needs only two diluting operations for any target concentration to achieve. Then, we explain another method called as dilution algorithm on a small dilution area ({\em DASDA}), which needs less area compared to that by {\em DATS}. Finally, we propose the heuristic for efficient dilution of biochemical fluids using a PMD chip referred as dilution algorithm in a restricted dilution area ({\em DARDA}) that produces more accurate (with less error) target concentration value on a restricted area of the PMD chip in a shorter mixing time. Simulation results reveal that {\em DARDA} outperforms a start-of-the-art dilution algorithm applicable for PMD chips in terms of three performance parameters namely mixing time, mixing area and error in target concentration.

16:00
Combining SWAPs and Remote CNOT Gates for Quantum Circuit Transformation

ABSTRACT. Quantum computers offer enormous speed advantages over their classical counterparts. Still, optimization on quantum circuits is necessary to further increase their potential. Additionally, physical realizations of quantum computers place restrictions on quantum circuits, regarding the available quantum gates. In order to satisfy these restrictions, non-native gates need to be expressed as an equivalent cascade of natively available quantum gates which induces a mapping overhead. Two complementary approaches to this problem are to move around the qubits (using SWAP gates) or to apply so-called remote gates, i.e. pre-computed cascades of native gates which keep the qubit placement. In this paper, we explore how combinations of movements and remote gates can be employed to reduce the required overhead regarding the number of native gates as well as the circuit depth. We also discuss ways to find out which qubits to address with the movements in order to optimize these metrics. Our general evaluation is supplemented by evaluations on two IBM quantum computer architectures to show how quantum circuits can be optimized by the presented patterns.

16:30-17:30 Session 20A: Dependability, Testing and Fault Tolerance in Digital Systems 2
16:30
Maximizing the Switching Activity of Different Modules Within a Processor Core via Evolutionary Techniques

ABSTRACT. One key aspect to be considered during device testing is the minimization of the switching activity of the circuit under test (CUT), thus avoiding possible problems stemming from overheating it. But there are also scenarios, where the maximization of certain circuits' modules switching activity could be proven useful (e.g., during Burn-In) in order to exercise the circuit under extreme operating conditions in terms of temperature (and temperature gradients). Resorting to a functional approach based on Software-based Self-test guarantees that the high induced activity cannot damage the CUT nor produce any yield loss. However, the generation of effective suitable test programs remains a challenging task. In this paper we consider a scenario where the modules to be stressed are sub-modules of a fully pipelined processor. We present a technique, based on an evolutionary approach, able to automatically generate stress test programs, i.e., sequences of instructions achieving a high toggling activity in the target module. The processor we used for our experiments is the Open RISC 1200. Results demonstrate that the proposed method is effective in achieving a high value of sustained toggling activity.

17:00
An Automated Setup for Large-Scale Simulation-Based Fault-Injection Experiments on Asynchronous Digital Circuits

ABSTRACT. Experimental fault injection is an essential tool in the assessment and verification of fault-tolerance properties. Often, in these experiments it is impossible to reasonably cover the huge parameter space spanned by target state and fault parameters, and compromises or restrictions must be made. This is even more pronounced for asynchronous circuits where a convenient discretization of time through a synchronous clock is not possible. In this paper we present a fault-injection toolset that allows for a very efficient injection and data processing, thus bringing studies with many billions of meaningful injections into asynchronous targets within reach. The key ingredients of our solution are an auto-setup feature capable of optimizing parameter values, seamless distribution of the simulation load to many host computers, and efficient arrangement of the important settings and readings in a database. We will use the example of a comparative study of different asynchronous pipeline styles to motivate the need for such an approach and illustrate its benefits.

16:30-17:30 Session 20B: Security and Privacy of Cyber-Physical Systems (SPCPS) 1
16:30
Towards Post-Quantum Enhanced Identity-based Encryption

ABSTRACT. Identity-based encryption (IBE) is a type of public-key encryption (PKE) that employs an identifier as the basis for the encryption mechanism. Thus, the communication parties are able to encrypt messages (or verify signatures) without any prior setup between users or distribution of user certificates. This is especially relevant in many mission-critical applications, usually characterized by constrained end-points. MIKEY-SAKKE uses this concept to build a highly scalable protocol able to secure cross-platform multimedia communications. However, MIKEY-SAKKE is based on cryptographic primitives that will be no longer secure when sufficiently powerful quantum computers are built. To this end, this paper presents three contributions. First, it evaluates the performance of MIKEY-SAKKE in constrained embedded devices. Second, it extracts the requirements that post-quantum cryptographic primitives should meet in order to allow a plug-and-play replacement of the threatened security primitives with quantum-secure primitives. Third, it benchmarks the different post-quantum primitives running in the NIST standardization process and analyses their impact on the quantum-secure MIKEY-SAKKE. The results show that none of the NIST finalists perfectly meet all the specified requirements to achieve a post-quantum plug-and-play approach. However, the different combinations of post-quantum KEMs and signature schemes have a range of trade-offs compared to SAKKE and ECCSI, either having slower computation or larger keys and ciphertexts/signatures, or both.

17:00
Digital Forensics, Video Forgery Recognition, for Cyber Security Systems

ABSTRACT. The recent advances in mobile video recording systems and the ease of access to them have led to an increased number of videos being streamed and uploaded on the internet daily. This has had a major impact on the expansion rate of the Internet of Things with the addition of affordable security systems that capture and stream video content. These systems regularly come with minimal security standards despite the fact that their goal is to provide court evidence. Criminals are often able to easily access, steal and manipulate videos to the point where it is many times impossible to tell the difference between an original and a forged one. Moreover, the advances in machine learning have led to an increase of computer-generated videos known as deep fakes that are used for humiliation, sabotage, threat and propaganda in every aspect of modern life, social, political or militaristic. As expected, the integrity of a video cannot be taken for granted and should in many cases be evaluated. The scientific branch of Computer Forensics has proposed various methods, both active and passive, for protecting the confidentiality and integrity of videos either at the source or on acquisition. In spite of that, the need for more robust and effective methods rises as videos grow in size and number while compression algorithms are becoming increasingly more efficient. The goal of this paper is to propose a new method on forgery detection based on the characteristics of Dense Optical Flow. This method is applied on a Raspberry Pi 4B which serves to simulate an IoT environment with a low-cost device that is capable of recognising forgery in static CCTV captured videos. The tool is tested using two datasets of forged videos and proves that this method is effective in detecting copy-move, insertion and deletion forgeries while being robust against medium rates of compression and noise. Ultimately, this master thesis proposes a methodology on automating the process of identification as well as recommendations for future research.

17:20
Revealing the secrets of Spiking Neural Networks: the case of Izhikevich neuron

ABSTRACT. Spiking Neural Networks (SNNs) are a strong candidate for the future of machine learning applications. SNNs can obtain the same accuracy of complex deep learning networks, while using a fraction of its power. As a result, an increase in popularity of SNNs is expected in the near future for Cyber Physical Systems, especially in the Internet of Things (IoT) segment. However, SNNs work very different than conventional neural network architectures. Consequently, applying SNNs in the field might introduce new unexpected security vulnerabilities. This paper explores and identifies potential sources of information leakage for the Izhikevich neuron, which is a popular neuron model used in digital implementations of SNNs. Simulations and experiments on actual neuromorphic hardware show that timing and power can be used to infer important information of the internal functionality of the network. Additionally, it is demonstrated that a reverse engineering attack is practical when both power and timing information are used.

17:30-18:30 Session 21A: Intelligent Transportation Systems (ITS) 3 AND Mixed-Criticality System Design (MCSDIA)
17:30
Runnable Configuration in Mixed Classic/Adaptive AUTOSAR Systems by Leveraging Nondeterminism

ABSTRACT. Classic AUTOSAR is a de facto standard for implementing automotive safety-critical systems. Building on its success, Adaptive AUTOSAR has been recently introduced to target applications that need both, a high performance, low real-time capability of an Adaptive platform, as well as a high real-time, low performance capability of a Classic one. The Adaptive part of these applications is mainly not required to conform to the highest ASIL C/D safety requirements where computations that depend on each other are always executed in the same order before a certain time point. Therefore, this paper proposes that deterministic execution is also relaxed for the Classic part of the applications.

18:00
Enabling Unit Testing of Already-Integrated AI-based Software Systems: The Case of Apollo for Autonomous Driving

ABSTRACT. The advanced AI-based software used for autonomous driving comprises multiple highly-coupled modules that are data and control dependent. Deploying those already-integrated software frameworks makes unit testing{, a fundamental step in the validation process of critical software,} very challenging in safety-critical systems. To tackle this issue, in this paper, we show the steps we followed to develop standalone versions of the modules in an industry-level autonomous driving framework (Apollo) by applying several modifications to its architectural design. We show how the standalone modules have the same functional behavior as their integrated counterpart modules. We exemplify the benefits of standalone modules by performing incremental analysis of the software timing requirements of each module running on a heterogeneous System on Chip (SoC). This is a mandatory step to consolidate and integrate software modules guaranteeing timing constraints (e.g. related to freedom from interference) while maximizing SoC utilization.

17:30-18:30 Session 21B: Dependability, Testing and Fault Tolerance in Digital Systems 3
17:30
Automatic Design of Fault-Tolerant Systems for VHDL and SRAM-based FPGAs

ABSTRACT. This paper presents and evaluates the possibility of automatic design of fault-tolerant systems from unhardened systems. We present an overview of our toolkit with its three main components: 1) fault-tolerant structures insertion (which we call helpers); 2) fault-tolerant structures selection (called guiders); and 3) automatic testbed generation, incorporating advanced acceleration techniques to accelerate the test and evaluation. Our approach is targeting complete independence on the HW description language and its abstraction level, however, for our case study, we focus on VHDL in combination with fine-grained n-modular redundancy. In the case study part of this paper, we proved that it is undoubtedly beneficial to select a proper fault tolerance method for each partition separately. Three experimental systems were produced with the usage of our method, out of which two achieved better reliability parameter while even lowering their chip area, compared to static allocation of equivalent fault tolerance technique type. In the case study, we target the best median time to failure, the so-called t50, however, our method is not dependent on this parameter and arbitrary optimization target can be selected, as soon as it is measurable.

17:50
Reliability Analysis of the FPGA Control System with Reconfiguration Hardening

ABSTRACT. A computing power is important in space applications where a utilization of FPGAs is very useful. However, the FPGAs are susceptible to manifestations of radiation which can cause malfunction. Particularly dangerous are configuration memory faults known as Single Event Upsets (SEUs), which can lead to the entire system failure. Therefore, the fault-tolerant techniques are used to prevent system failures. The main motivation for the use of these techniques is to maintain the correct behavior of the system despite the occurrence of faults. In addition to fault masking, which only delay system failures due to fault accumulation, the utilization of fault mitigation by partial dynamic reconfiguration was used. Everything needed is provided by the reconfiguration controller, which is a necessary additional component of the entire system. It is also very convenient to be able to detect the occurrence of fault in the system. After that, the system need not to be restored unnecessarily, which saves useless work of the controller. The key part is the evaluation of the resilience to faults of the system using the reconfiguration of damaged parts. In all experiments, an experimental platform was used that emulates an electromechanical system, which consists of a robot control unit on an FPGA and a simulation of their behavior on a PC. Artificial faults have been injected into this controller on the FPGA. Furthermore, reliability estimation data, which was collected from our previously published simulations, was verified on a real system in our current experiments.

18:10
Implementation-Independent Test Generation for a Large Class of Faults in RISC Processor Modules

ABSTRACT. In this paper, a novel concept is proposed for generating tests of RISC processors relying on pure functional information without knowledge of the structural implementation details. For the first time, the effect-cause idea is applied for test generation, instead of the traditional cause-effect fault driven approach. The generated tests will be better suitable for fault diagnosis, and the method allows for extending the classes of faults, detected by tests. The motivation of this work is twofold. First, to give the possibility of test generation in the first stages of design and to allow for a systematic way to assess the test quality of a functional design. Second, to extend the fault class beyond the traditional Stuck-at-Fault (SAF) class, such as functional faults similar to that of covered by the March algorithm in memory testing. A special target of the paper is to use the proposed functional fault model for implementation-independent generation of test sequences for delay faults when the list of paths is not given. To reduce the complexity of test generation, two-level partitioning of the system is used, first, into Modules Under Test (MUT), and second, each MUT into control (CP) and data parts (DP). A novel constraints-based high-level functional fault model is proposed for the CP. For the DP, pseudo-exhaustive patterns are used for testing. By experimental research it was demonstrated that the test quality of the proposed implementation-independent test generation method produces test sequences with comparable or better fault coverages for SAFs and Transition Delay Faults (TDF) than known methods which utilize knowledge about implementation details.