DASIP2025: WORKSHOP ON DESIGN AND ARCHITECTURES FOR SIGNAL AND IMAGE PROCESSING
PROGRAM FOR MONDAY, JANUARY 20TH
Days:
next day
all days

View: session overviewtalk overview

10:15-11:00 Session 2: Keynote Presentation 1
Location: HiPEAC - Room 8
10:15
Trends in Space Implementation of Deeptech for signal and data processing

ABSTRACT. The space sector has been always considered one of the most innovative and top technological. However this approach relies on the difficulties and harsh environment that the satellites and spacecrafts must suffer. The massive processing is just now reaching the space sector due to the huge amount of data generated and / or directly available ‘on the edge’, this fact is facing a set of obstacles to overcome to properly behave under this environment. A wide set of opportunities are opening on the space sector, directly linked to the specific segment and quality approach selected, from the application of AI on the large missions where the reliability is a must to the low end segment where solitions try to mimic ground architectures but with major constraints on power or performances. The future is open to major innovations where the new products will be natively compatible with cloud computing, edge computing, AI and other deeptech. To conclude it is worth to mention that intrinsically the processing in space is clean and green once overcomed the launch process.

11:00-11:30Coffee Break
11:30-12:45 Session 3: DASIP Session

Session 1 

Location: HiPEAC - Room 8
11:30
CSD-Driven Speedup in RISC-V Processor

ABSTRACT. This paper introduces a synthesizable µ-architectural design method to boost the performance of a given RISC-V processor architecture by utilizing Canonical Signed Digit (CSD) representation during the execution stage within the processor pipeline. CSD is a unique ternary number system that enables carry/borrow-free addition/subtraction in constant time O(1) regardless of word length N. The CSD extension is exemplarily demonstrated to the Potato processor, a simple RISC-V implementation for FPGAs. However, the method can also be applied to other implementations in principle. Our performance boost due to the CSD requires an overhead for conversion between binary and CSD representation. This overhead is compensated by an extension to a seven-stage pipeline architecture, featuring a three-step execution stage that increases the throughput and the operating frequency and enables loop unrolling, which is especially advantageous in applications with consecutive calculations, e.g., signal processing. By experimental results, we compared our CSD-based ternary solution to the original implementation which utilizes the usual pure binary number representation of the operands. Our approach achieved a 2.41X increase in operating frequency over the original RISC-V processor on FPGA, with over 20% of this gain attributed to the CSD encoding. This enhancement resulted in up to a 2.40X improvement in throughput and a 2.37X reduction in execution time for computation-intensive benchmark applications.

11:50
Efficient FPGA Implementation of ViT Non-Linear Functions
PRESENTER: Hana Krichene

ABSTRACT. Building on the performance of Transformers, Vision Transformers (ViTs) have recently been applied across a wide range of Computer Vision (CV) tasks, demonstrating superior results compared to traditional convolutional neural networks (CNNs). However, the computational demands of ViTs remain a concern when deployed on edge devices, primarily due to the complexity of their various layers. Unlike CNNs, ViTs rely on multiple non-linear functions -such as Softmax, GELU, and LayerNorm-that significantly increase resource utilization, power consumption, and potentially induce high latency. Addressing the lack of FPGA-based implementations of non-linear functions, this work proposes a resource-efficient FPGA solution that achieves a 5.8x reduction in Look-Up Tables (LUTs) and a 12.3x reduction in registers, with a throughput of 32 processed elements per cycle and a clock frequency of 200MHz, enabled by a reuse technique. Additionally, latency is reduced by 1.7x through a proposed three-stage pipelined architecture. The design is implemented on the Xilinx XCVU9P FPGA, leveraging decomposition techniques to resolve data dependencies and analyze similarities between the three non-linear functions in ViTs. A second-order mathematical approximation is employed to facilitate efficient synthesis of these non-linear functions on the FPGA.

12:10
LiFT: Lightweight, FPGA-tailored 3D object detection based on LiDAR data
PRESENTER: Konrad Lis

ABSTRACT. This paper presents LiFT, a lightweight, fully quantized 3D object detection algorithm for LiDAR data, optimized for real-time inference on FPGA platforms. Through an in-depth analysis of FPGA-specific limitations, we identify a set of FPGA-induced constraints that shape the algorithm's design. These include a computational complexity limit of 30 GMACs (billion multiply-accumulate operations), INT8 quantization for weights and activations, 2D cell-based processing instead of 3D voxels, and minimal use of skip connections.

To meet these constraints while maximizing performance, LiFT combines novel mechanisms with state-of-the-art techniques such as reparameterizable convolutions and fully sparse architectures. Key innovations include the Dual-bound Pillar Feature Net, which boosts performance without increasing complexity, and an efficient scheme for INT8 quantization of input features.

With a computational cost of just 20.73 GMACs, LiFT stands out as one of the few algorithms targeting minimal-complexity 3D object detection. Among comparable methods, LiFT ranks first, achieving an mAP of 51.84% and an NDS of 61.01% on the challenging NuScenes validation dataset. The code will be available at https://github.com/vision-agh/lift.

13:00-14:00Lunch Break
14:00-15:30 Session 4: DASIP Session

Session 2

Location: HiPEAC - Room 8
14:00
A practical HW-aware NAS flow for AI vision applications on embedded heterogeneous SoCs
PRESENTER: Agathe Archet

ABSTRACT. Implementing efficient Deep Neural Networks (DNNs) for dense-prediction vision applications on embedded heterogeneous SoCs comes with many challenges, such as latency and energy constraints. To tackle them, we propose a novel and practical multi-objective Hardware-aware Neural Architecture Search (HW-NAS) framework able, for the first time, to handle complex search spaces while considering the hardware manufacturer’s expertise. This HW-NAS flow targeting Nvidia’s Orin SoCs relies on (1) a practical strategy to reduce the total exploration duration, and (2) a compact enhancement of the existing TensorRT deployment flow. On the FasterSeg’s search space, our framework can obtain a latency-power-mIoU Pareto front for multiple power modes in only 66 hours (-33 %) using 8 Nvidia A100 GPUs. Compared to default mappings, these results demonstrate that our novel mapping strategy can obtain practical solutions with either 50 % less power consumption or 80 % less latency for the same accuracy performance, or achieve a better accuracy (+6 %) with 30 % less power consumption

14:20
Endoscopy image classification for wireless capsules with CNNs on microcontroller-based platforms
PRESENTER: Paola Busia

ABSTRACT. Wireless Capsule Endoscopy (WCE) offers an important diagnostic instrument for different gastrointestinal diseases. Enhancing the WCE device with real-time image processing capabilities allows to assist specialized physicians in the long and cumbersome process of inspecting the significant amount of data acquired during the examination procedure, providing the first detection of the signs of relevant diseases that require further attention. In this work, we evaluate different state-ofthe- art Convolutional Neural Network models for real-time WCE image classification, focusing on lightweight topologies suitable for execution on low-power microcontroller platforms and integration on the WCE device. The selected WCE-SqueezeNet model achieves 98.5% accuracy in the classification of ulcerative colitis, polyps, and esophagitis against healthy samples, allowing classification at a 16 fps rate on the GAP9 multi-core platform, with 61 ms inference time and 30.6 mW average core power consumption.

14:40
Joint Underwater Depth Estimation and Dehazing from a Single Image using Attention U-Net
PRESENTER: Saqib Nazir

ABSTRACT. Underwater imaging presents unique challenges compared to open-air photography, primarily due to diminished visibility and geo- metric distortions, impeding the development of underwater Computer Vision (CV) and robotic vision perception. Previous methods relying on simplified image formation models for image enhancement have often yielded unsatisfactory results. This paper proposes a new deep learning- based architecture for joint depth estimation and dehazing from a sin- gle underwater monocular image, seeking to leverage the mutual ben- efits between these two interrelated tasks. The proposed architecture is a Two-Headed Depth Estimation and Dehazing Attention Network (2HDED:AttN) model with an end-to-end training approach. Compre- hensive experiments on synthetic and real underwater datasets showcase the proposed architecture’s superior performance in jointly addressing underwater depth estimation and image dehazing tasks. The method ef- fectively estimates the underwater depth and improves underwater image quality, paving the way for enhanced underwater computer and robotic vision applications.

15:00
KD-AHOSVD: Neural Network Compression via Knowledge Distillation and Tensor Decomposition
PRESENTER: Laura Meneghetti

ABSTRACT. In the field of Deep Learning, the high number of parameters in models has become a significant concern within the scientific community due to the increased computational resources and memory required for training and inference. Addressing this issue, we propose a novel tensorized technique to compress network architectures. Our approach aims to significantly reduce the network’s size and the number of parameters by integrating Averaged Higher Order Singular Value Decomposition with a novel Knowledge Distillation approach. Specifically, we replace certain layers of the original architecture with layers that perform linear projections onto a reduced space defined by our reduction technique. We conducted experiments on image classification tasks using multiple architectures and datasets. The evaluation focuses on final accuracy, model size, and parameter reduction, comparing our approach with both the original models and quantization, a widely used reduction method. The results underscore the effectiveness of our method in significantly reducing the number of parameters and the overall size of neural networks while maintaining high performance.

15:30-16:00Coffee Break