PROGRAM
Days: Monday, August 25th Tuesday, August 26th Wednesday, August 27th Thursday, August 28th Friday, August 29th
Monday, August 25th
View this program: with abstractssession overviewtalk overview
10:00-10:30 Session 6A: HiPES.1
Chair:
Raffaele Montella (Università degli Studi di Napoli Parthenope, Italy)
Location: BAR 205
10:00 | A framework for flooding early warning leveraging AI, HPC, and computing continuum (abstract) |
10:10-10:30 Session 7: WSCC.1
Chair:
Massimo Torquati (University of Pisa, Italy)
Location: BAR I88
10:10 | Efficient FPGA-based GAN Accelerator Core for Edge-AI Platforms (abstract) |
10:30-11:00Coffee Break
11:00-12:20 Session 8A: WSCC.2
Chair:
Massimo Torquati (University of Pisa, Italy)
Location: BAR I88
11:00 | Simplifying distributed workflows: A portable approach for Cloud and HPC (abstract) |
11:20 | HPC Software as a Service: A Flexible Approach to Data Logistics (abstract) |
11:40 | A Holistic Approach to Complexity Management and Multidimensional Analysis in Computing Continuum (abstract) |
12:00 | Light Weight Scalable DevOps for Cloud Robotics (abstract) |
11:00-12:30 Session 8B: DynResHPC.1
Chair:
Sergio Iserte (Barcelona Supercomputing Center, Spain)
Location: BAR 106
11:00 | Design Principles of Dynamic Resource Management for High-Performance Parallel Programming Models (abstract) |
11:30 | A Case Study for Resolving Composability Issues Using a Shared CPU Resource Coordinator (abstract) |
12:00 | Experimental Evaluation of Scheduling Strategies for Evolving Workflow-Based Applications (abstract) |
11:00-12:30 Session 8C: HiPES.2
Chair:
Raffaele Montella (Università degli Studi di Napoli Parthenope, Italy)
Location: BAR 205
11:00 | Thread Monitoring Tool: transparent characterization of threading patterns with eBPF (abstract) |
11:30 | Accelerating SWIRL Workflows: A High-Performance Rust Backend for Distributed Execution (abstract) |
12:00 | Building Parallel Machine Learning Workflows in PyCOMPSs: The Case Study of Tsunami Forecasting (abstract) |
11:00-11:50 Session 8D: GraphSys.1
Chair:
Tiziano De Matteis (Vrije Universiteit Amsterdam, Netherlands)
Location: BAR 218
11:00 | A Comparative Study of Streaming Graph Processing Systems (abstract) |
11:25 | A Unified Ontology for Scalable Knowledge Graph–Driven Operational Data Analytics in High-Performance Computing Systems (abstract) |
12:30-14:00Lunch Break
14:00-15:30 Session 11A: DynResHPC.2
Chair:
Sergio Iserte (Barcelona Supercomputing Center, Spain)
Location: BAR 106
14:00 | Comparative Analysis of Algorithms for Malleability Decision-Making in Applications and File Systems (abstract) |
14:30 | Malleability in LAIK with MPI Dynamic Processes and PSets (abstract) |
15:00 | Dynamic Data Redistribution for Malleable MPI Frameworks through Virtual Topologies (abstract) PRESENTER: Ahmad Tarraf |
15:15 | Dynamic reconfiguration for malleable applications using RMA (abstract) |
14:00-14:30 Session 11B: HiPES.3
Chair:
Raffaele Montella (Università degli Studi di Napoli Parthenope, Italy)
Location: BAR 205
14:00 | A Computer-aided Framework for Detecting Osteosarcoma in Computed Tomography Scans (abstract) |
14:30-15:00 Session 12: HiPES Panel: Discussing the vision about the high-performance cloud computing in eScience application
Location: BAR 205
14:40-15:30 Session 13: GraphSys.2
Chair:
Tiziano De Matteis (Vrije Universiteit Amsterdam, Netherlands)
Location: BAR 218
14:40 | Efficient handling of sparse vectors for parallel nonblocking execution in GraphBLAS (abstract) |
15:05 | Millibenchmarking: Using Graph Sampling for Ranking GPU PageRank Implementations (abstract) |
15:30-16:00Coffee Break
17:00-17:30 Session 17: GraphSys: Panel
Chair:
Jože M. Rožanec (Jožef Stefan Institute, Slovenia)
Location: BAR 218
Tuesday, August 26th
View this program: with abstractssession overviewtalk overview
09:00-09:50 Session 19B: VHPC Keynote Tutorial: Writing a hypervisor from scratch. Seiya Nuta, Vercel Inc.
Location: BAR 218
09:50-10:30 Session 23: VHPC.1
Chair:
Michael Alexander (Austrian Academy of Sciences, Austria)
Location: BAR 218
09:50 | Enabling RDMA and GPUs in Rootless Kubernetes for Accelerated HPC and AI Applications (abstract) |
10:00-10:30 Session 24: HeteroPar.1
Chair:
José Cano (University of Glasgow, UK)
Location: BAR 205
10:00 | Open, cross-architecture acceleration of data analytics with SYCL and RISC-V (abstract) |
10:30-11:00Coffee Break
11:00-12:30 Session 25A: PECS.1
Chair:
Romolo Marotta (Università degli Studi Roma Tre, Italy)
Location: BAR 106
11:00 | Evaluating Energy Efficiency of Genomics Algorithms on Processing-in-Memory Architectures (abstract) |
11:30 | SYCL for Energy-Efficient Computational Astrophysics: the case of DPEcho (abstract) |
12:00 | Alumet: a modular framework to standardize the measurement of energy consumption (abstract) |
11:00-12:30 Session 25B: HeteroPar.2
Chair:
José Cano (University of Glasgow, UK)
Location: BAR 205
11:00 | Federated Learning in the Edge-Cloud Continuum: A Task-Based Approach with Colony (abstract) PRESENTER: Alessio Orsino |
11:30 | OpenDwarfs 2025: Modernizing the OpenDwarfs Benchmark Suite for Heterogeneous Computing (abstract) |
12:00 | Portable High-Performance Kernel Generation for a Computational Fluid Dynamics Code with DaCe (abstract) |
11:00-12:00 Session 25C: VHPC.2
Chair:
Michael Alexander (Austrian Academy of Sciences, Austria)
Location: BAR 218
11:00 | Performance Analysis of Container-in-VM Architectures: A Study on Hypervisor Isolation and Lightweight OS Integration (abstract) |
11:30 | WebAssembly and Unikernels: A Comparative Study for Serverless at the Edge (abstract) |
12:30-14:00Lunch Break
14:00-15:30 Session 26A: PECS.2
Chair:
Romolo Marotta (Università degli Studi Roma Tre, Italy)
Location: BAR 106
14:00 | Mixed precision over GPU applied to a Microphysics model (abstract) |
14:30 | Comparative Analysis of Energy Efficiency in Actor-Based Applications in Distributed Environments (abstract) |
15:00 | HPC Benchmark Game: Comparing Programming Languages Regarding Energy-Efficiency for Applications from the HPC Field (abstract) |
14:00-15:30 Session 26B: HeteroPar.3
Chair:
José Cano (University of Glasgow, UK)
Location: BAR 205
14:00 | Cyclic Data Streaming on GPUs for Short Range Stencils Applied to Molecular Dynamics (abstract) PRESENTER: Martin Rose |
14:30 | A Portable Branch-and-Bound Algorithm for Cross-Architecture Multi-GPU Systems (abstract) |
15:00 | Tracking the Critical Path of Execution for GPU Offloading Applications (abstract) |
15:30-16:00Coffee Break
16:00-17:00 Session 27A: PECS.3
Chair:
Romolo Marotta (Università degli Studi Roma Tre, Italy)
Location: BAR 106
16:00 | Analysis of the carbon footprint of HPC (abstract) |
16:30 | Quantifying the Energy Consumption and Carbon Emissions of LLM Inference via Simulations (abstract) |
16:00-17:30 Session 27B: HeteroPar.4
Chair:
José Cano (University of Glasgow, UK)
16:00 | SIMON: A Simple Monitoring Framework for Heterogeneous Application Observability (abstract) |
16:30 | Exploiting highly heterogenous systems with stencil applications (abstract) |
17:00 | Green Energy Aware Scheduling of Scientific Workflows with Flexible Deadlines (abstract) |
Wednesday, August 27th
View this program: with abstractssession overviewtalk overview
09:00-09:30 Session 30: Opening Session
Chair:
Wolfgang E. Nagel (TU Dresden, Germany)
Location: BAR SCHÖ
09:30-10:30 Session 31: Keynote 1: Martin Schulz
Chair:
Wolfgang E. Nagel (TU Dresden, Germany)
Location: BAR SCHÖ
10:30-11:00Coffee Break
11:00-12:30 Session 32A: Track 2.1: Scheduling, Resource Management, Cloud, Edge Computing, and Workflows
Chair:
Rosa María Badia (Barcelona Supercomputing Center, Spain)
Location: BAR 205
11:00 | ARC-V: Vertical Resource Adaptivity for HPC Workloads in Containerized Environments (abstract) PRESENTER: Jacob Wahlgren |
11:20 | An Autonomy Loop for Dynamic HPC Job Time Limit Adjustment (abstract) PRESENTER: Thomas Jakobsche |
11:40 | Enabling Elasticity in Scientific Workflows for High Performance Computing Systems (abstract) PRESENTER: Rajat Bhattarai |
12:00 | WAPA: A Workload-Agnostic CPI-Based Thread-to-Core Allocation Policy (abstract) PRESENTER: Marta Navarro |
11:00-12:30 Session 32B: Track 3.1: Neural Network Acceleration and Optimization
Chair:
Dora Blanco (University of Santiago de Compostela, Spain)
Location: BAR 106
11:00 | FDHA: Fusion-Driven Heterogeneous Accelerator for Efficient Diffusion Model Inference (abstract) |
11:20 | CoQMoE: Co-Designed Quantization and Computation Orchestration for Mixture-of-Experts Vision Transformer on FPGA (abstract) |
11:40 | SkipNZ: Non-Zero Value Skipping for Efficient CNN Acceleration (abstract) PRESENTER: Jinhyeok Choi |
12:00 | BATCH-DNN: Adaptive and Dynamic Batching for Multi-DNN Accelerators (abstract) PRESENTER: Piyumal Ranawaka |
11:00-12:30 Session 32C: Track 6.1: Memory and I/O Systems
Chair:
Bettina Schnor (University of Potsdam, Germany)
Location: BAR 218
11:00 | NetSenseML: Network-Adaptive Compression for Efficient Distributed Machine Learning (abstract) PRESENTER: Yisu Wang |
11:20 | Breaking the I/O Barrier: 1.2 Tb/s Ethernet Packet Processing on a GPU (abstract) |
11:40 | GECKO: A Write-optimized Hybrid Index based on Disaggregated Memory (abstract) |
12:00 | Scalable OpenMP Remote Offloading via Asynchronous MPI and Coroutine-Driven Communication (abstract) PRESENTER: Jhonatan Cléto |
11:00-12:30 Session 32D: PhD Symposium Poster Pitch Session
Chairs:
Leonel Sousa (INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Portugal)
Michael Färber (ScaDS.AI & TU Dresden, Germany)
Michael Färber (ScaDS.AI & TU Dresden, Germany)
Location: BAR I88
12:30-14:00Lunch Break
14:00-15:00 Session 33A: Track 1.1: Performance Analysis and Simulation
Chair:
Olaf Krzikalla (Deutsches Zentrum für Luft- und Raumfahrt (DLR), Germany)
Location: BAR 205
14:00 | Making MPI Collective Operations Visible: Understanding Their Utility and Algorithmic Insights (abstract) PRESENTER: Anna-Lena Roth |
14:20 | TSim4CXL: Trace-driven Simulation Framework for CXL-based High-Performance Computing Systems (abstract) PRESENTER: Jaewoo Son |
14:40 | THAPI: Tracing Heterogeneous APIs (abstract) PRESENTER: Brice Videau |
14:00-15:00 Session 33B: Track 6.2: Learning systems
Chair:
Salvador Petit (Universitat Politècnica de València, Spain)
Location: BAR 218
14:00 | SQ-DeAR: Sparsified and Quantized Gradient Compression for Distributed Training (abstract) PRESENTER: Xinrui Yang |
14:20 | Accelerating Independent Multi-Agent Reinforcement Learning on Multi-GPU Platforms (abstract) PRESENTER: Samuel Wiggins |
14:40 | ScheInfer: Efficient Inference of Large Language Models with Task Scheduling on Moderate GPUs. (abstract) PRESENTER: Wenxiang Lin |
14:00-15:00 Session 33C: WHPC Special Session: Advances in HPC Computing Applications
Chair:
Neda Ebrahimi Pour (German Aerospace Center (DLR), Germany)
Location: BAR 106
14:00 | Targeted data movement optimizations for emerging heterogeneous supercomputers (abstract) |
14:20 | Efficient Anisotropic Mesh Refinement with Omnitrees ...or How to Get Cat GIFs Into Your Paper (abstract) |
14:40 | From Reactive Debugging to Proactive Detection: AI for Performance-Aware Software Development (abstract) |
15:00-16:00Coffee Break and PhD Symposium and Poster&Demos Session
The PhD Symposium Posters and the Posters & Demos will be on display in this coffee break.
15:00-16:00 Session 34A: Demos&Poster Session during the Coffee Break
Chairs:
Optimized Parallel Metaheuristics for Big Data Processing on GPUs with Apache Spark (abstract) |
Portable and Scalable FPGA Emulation of a Massive-Parallel Vector Processor (abstract) PRESENTER: Gia Bao Thieu |
Modifying the HyperLedger Fabric Blockchain Architecture to increase throughput and decrease transaction rejections (abstract) |
Time-related effects in the measurement of energy consumption in evolutionary algorithms (abstract) |
ParSolGen (Parallel Solvers Generator) - an automated numerical parallel programs generator for distributed memory parallel computers (abstract) |
Towards Digital Twins of HPC Data Centres Modelling Infrastructure and HPC Systems for IT-Zauber (abstract) |
Fault-Tolerant Distributed Federated Learning with Adaptive Termination Detection (abstract) |
H2O: Holistic Hyper-Parameter Optimization for Large-Scale Deep Neural Network Training (abstract) |
15:00-16:00 Session 34B: PhD Symposium Poster Session during the Coffee Break
Chairs:
Leonel Sousa (INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Portugal)
Michael Färber (ScaDS.AI & TU Dresden, Germany)
Michael Färber (ScaDS.AI & TU Dresden, Germany)
Power Scheduling on Multicore Multiprocessor Systems for Maximizing Throughput and Fairness (abstract) |
Accelerating Gate Sizing using GPU (abstract) PRESENTER: Yi-Hua Chung |
SCOPE: Accelerating ML data pipeline using cloud-based computational storage (abstract) |
Advanced Techniques in Polyhedral Model-Based Compilers for Efficient and Cross-Platform Code Generation on Multicore Processors (abstract) |
CoreWaterfall: a Virtual-Core-Focused Scheduling and Allocation Algorithm for Oversubscribed Virtual Machines (abstract) |
On-the-fly Performance Analysis of Asynchronous Parallel Execution (abstract) |
TH-Pulse: A Study on Hardware-Software Co-Designed Framework for LLM Training and Inference on the Tianhe new-generation supercomputer (abstract) |
DCG-DDQ: A Directed Cyclic Graph Based Task Computing System (abstract) |
A Hybrid DMA-Cache Mechanism to Leverage Memory Bandwidth in Massive-Parallel Processors (abstract) PRESENTER: Gia Bao Thieu |
Boosting Performance of Counting Queries in Machine Learning Applications with a ccNUMA-aware Implementation (abstract) |
EAGER: Energy-Aware 3D Gaussian Splatting on Embedded Parallel Heterogeneous Systems (abstract) PRESENTER: Oscar Ferraz |
AskLLVM: LLVM Code Generation for GPUs for Graph Algorithms (abstract) |
Heterogeneous computing, storage and network infrastructures for medical applications (abstract) |
16:00-17:30 Session 35A: Track 2.2: Scheduling, Resource Management, Cloud, Edge Computing, and Workflows
Chair:
Domenico Talia (University of Calabria, Italy)
Location: BAR 205
16:00 | HAS-GPU: Efficient Hybrid Auto-scaling with Fine-grained GPU Allocation for SLO-aware Serverless Inferences (abstract) PRESENTER: Jianfeng Gu |
16:20 | CGP-Graphless: Towards Efficient Serverless Graph Processing via CPU-GPU Pipelined Collaboration (abstract) PRESENTER: Yiming Sun |
16:40 | Design and Operation of Elastic GPU-pooling on Campus (abstract) |
17:00 | ServerlessRec: Fast Serverless Inference for Embedding-based Recommender Systems with Disaggregated Memory (abstract) |
16:00-17:30 Session 35B: Track 6.3: Stream, Image and Sequence Processing
Chair:
Daniel Cordeiro (University of São Paulo, Brazil)
Location: BAR 218
16:00 | SProBench: Stream Processing Benchmark for High Performance Computing Infrastructure (abstract) PRESENTER: Apurv Deepak Kulkarni |
16:20 | SWBWA: A Highly Efficient NGS Aligner on the New Sunway Architecture (abstract) PRESENTER: Lifeng Yan |
16:40 | Efficient Pyramidal Analysis of Gigapixel Images on a Decentralized Modest Computer Cluster (abstract) |
16:00-17:30 Session 35C: WHPC Special Session: Advances in HPC Computing Applications
Chair:
Neda Ebrahimi Pour (German Aerospace Center (DLR), Germany)
Location: BAR 106
16:00 | Performance optimization of GROMACS on modern Hardware (abstract) |
16:20 | FLEXI: Scale-resolving simulations of compressible turbulence on modern HPC systems (abstract) |
16:40 | Exploring Flow Fields at Scale: GPU-Accelerated Scientific Visualization for Exascale CFD (abstract) |
17:00 | In-Situ Techniques for the Efficient Coupling of Complex Plasma Turbulence Simulations: GENE and GENE-X (abstract) |
Thursday, August 28th
View this program: with abstractssession overviewtalk overview
09:00-10:00 Session 36: Keynote 2: Domenico Talia
Chair:
Christian Lengauer (University of Passau, Germany)
Location: BAR SCHÖ
10:00-10:30Coffee Break
10:30-12:30 Session 37: Best Paper Session
Chair:
Thomas Ludwig (DKRZ, Germany)
Location: BAR SCHÖ
10:30 | Noise injection for performance bottleneck analysis (abstract) PRESENTER: Aurélien Delval |
10:50 | Approximation Bounds for SLACK on Identical Parallel Machines (abstract) PRESENTER: Anthony Dugois |
11:10 | SimPoint+: More Stable, Accurate and Efficient Program Analysis (abstract) PRESENTER: Ruini Xue |
11:30 | AlphaSparseTensor: Discovering Faster Sparse Matrix Multiplication Algorithms on GPUs for LLM Inference (abstract) PRESENTER: Xuanzheng Wang |
11:50 | Wedge-Parallel Triangle Counting for GPUs (abstract) PRESENTER: Jeffrey Spaan |
12:10 | External GPU Biconnected Components (abstract) PRESENTER: Abhijeet Sahu |
12:30-14:00Lunch Break
14:00-15:30 Session 38A: Track 1.2: Compilers, Optimizations, and Scheduling
Chair:
Lars Schütze (TU Dresden, Germany)
Location: BAR 205
14:00 | CoSF: A Co-Optimization Framework for Operator Splitting and Fusion (abstract) PRESENTER: Wei Li |
14:20 | Scalable Code Generation for RTL Simulation of Deep Learning Accelerators with MLIR (abstract) PRESENTER: Yi-Hua Chung |
14:40 | Scheduling Task and Data Parallelism in Array Languages with Work Assisting (abstract) PRESENTER: Ivo Gabe de Wolff |
15:00 | Polymorphic Higher-Order GPU Kernels (abstract) PRESENTER: Andre Rauber Du Bois |
14:00-15:30 Session 38B: Track 4.1: Scalable AI Optimization and Parallel Training
Chair:
Alessio Orsino (University of Calabria, Italy)
Location: BAR 106
14:00 | Saving Memory via Residual Reduction for DNN Training with Compressed Communication (abstract) PRESENTER: Xinjue Zheng |
14:20 | Interval-Asynchrony: Delimited Intervals of Localised Asynchrony for Fast Parallel SGD (abstract) PRESENTER: Jacob Garby |
14:40 | Robustness of deep learning classification to adversarial input on GPUs: asynchronous parallel accumulation is a source of vulnerability (abstract) |
15:00 | Tutoring LLM into a Better CUDA Optimizer (abstract) PRESENTER: Martin Kruliš |
14:00-15:30 Session 38C: Track 3.2: Architecture
Chair:
Paul Kelly (Imperial College, UK)
Location: BAR 218
14:00 | ParTEE:A Framework for Secure Parallel Computing of RISC-V TEEs (abstract) PRESENTER: Hao Lan |
14:20 | ARM SVE Unleashed: Performance and Insights Across HPC Applications on Nvidia Grace (abstract) PRESENTER: Ruimin Shi |
14:40 | CSGC: Collaborative File System Garbage Collection with Computational Storage (abstract) PRESENTER: Jin Pu |
15:00 | SONet: Towards Practical Online Neural Network for Enhancing Hard-To-Predict Branches (abstract) PRESENTER: Zhenxuan Xiong |
15:30-16:00Coffee Break
16:00-17:30 Session 39A: Track 3.3: Caching and Memory for ML
Chair:
Fernando Silva (University of Porto, Portugal)
Location: BAR 106
16:00 | CacheC: LLM-based GPU Cache Management to Enhance Kernel Concurrency (abstract) |
16:20 | Cocache: An Accurate And Low-overhead Dynamic Caching Method for GNNs (abstract) PRESENTER: Zhaoyang Zeng |
16:40 | DCI: An Efficient Workload-Aware Dual-Cache Allocation GNN Inference Acceleration System (abstract) |
17:00 | ReSpike: A Co-Design Framework for Evaluating SNNs on ReRAM-based Neuromorphic Processors (abstract) PRESENTER: Kazi Asifuzzaman |
16:00-17:30 Session 39B: Track 2.3: Scheduling, Resource Management, Cloud, Edge Computing, and Workflows
Chair:
Alvaro Luiz Fazenda (Federal University of Sao Paulo (UNIFESP), Brazil)
Location: BAR 205
16:00 | MPLS: Stacking Diverse Layers into One Model for Decentralized Federated Learning (abstract) PRESENTER: Zhiwei Yao |
16:20 | Federated Learning within Global Energy Budget over Heterogeneous Edge Accelerators (abstract) PRESENTER: Roopkatha Banerjee |
16:40 | Auction-based Placement of Functions in the Fog at Scale (abstract) |
17:00 | Bifröst: Peer-to-peer Load-balancing for Function Execution in Agentic AI Systems (abstract) |
16:00-17:30 Session 39C: Track 4.2: Efficient AI Inference and Model Serving at Scale
Chair:
Julio Sahuquillo (Universitat Politècnica de València, Spain)
Location: BAR 218
16:00 | TopServe: Task-Operator Co-Scheduling for Efficient Multi-DNN Inference Serving on GPUs (abstract) PRESENTER: Ao Chen |
16:20 | EFIM: Efficient Serving of LLMs for Infilling Tasks with Improved KV Cache Reuse (abstract) PRESENTER: Tianyu Guo |
16:40 | 2:4 Pruning on Edge Devices: Performance, Energy Efficiency and Accuracy (abstract) PRESENTER: Nicolás Hernández González |
17:00 | Light-DiT: An Importance-Aware Dynamic Compression Framework for Diffusion Transformers (abstract) PRESENTER: Cheng Gu |
Friday, August 29th
View this program: with abstractssession overviewtalk overview
09:00-10:00 Session 40: Keynote 3: Florina Ciorba
Chair:
Diana Goehringer (TU Dresden, Germany)
Location: BAR SCHÖ
10:10-10:30Coffee Break
10:30-12:00 Session 42A: Track 2.4: Scheduling, Resource Management, Cloud, Edge Computing, and Workflows
Chair:
Carlos Barrios Hernandez (SC3UIS-CAGE, LIG/INRIA-DataMove, CITI/INRIA -Sindy, Colombia)
Location: BAR 205
10:30 | DynoInfer: Adaptive Resource Orchestration for LLM Inference on Resource-Constrained PCs (abstract) PRESENTER: Yunling Chen |
10:50 | Container Workload Prediction Using Deep Domain Adaptation in Transfer Learning (abstract) |
11:10 | KarmaPM: Reward-Driven Power Manager (abstract) |
11:30 | A Sparsity Predicting Approach for General Large Language Models via Activation Pattern Clustering (abstract) PRESENTER: Nobel Dhar |
10:30-12:00 Session 42B: Track 4.3: Distributed systems, Compression, and Federated Applications
Chair:
Josef Weidendorfer (Technical University of Munich, Germany)
Location: BAR 106
10:30 | DiffNO: Neural Operator Learning using Physically Structured Constrained Diffusion Model (abstract) |
10:50 | Scalable Compression of Massive Data Collections on HPC Systems (abstract) PRESENTER: Loris Belcastro |
11:10 | On-Device Federated Learning for Remote Alpine Livestock Monitoring (abstract) PRESENTER: Sabtain Ahmad |
11:30 | IAUG: Accelerating Augmentation with Importance Sampling in Deep Neural Network Training (abstract) PRESENTER: Germaine Nyatsikor |
10:30-12:00 Session 42C: Track 5.1: Theory and Algorithms
Chair:
Jože M. Rožanec (Jožef Stefan Institute, Slovenia)
Location: BAR 218
10:30 | Cache Management for Mixture-of-Experts LLMs (abstract) PRESENTER: Adrien Obrecht |
10:50 | Near-optimal contraction strategies for the scalar product in the tensor-train format (abstract) PRESENTER: Atte Torri |
11:10 | Supervised Distributed Computing (abstract) PRESENTER: Julian Werthmann |
11:30 | Partial Detectors Versus Replication To Cope With Silent Errors (abstract) PRESENTER: Alix Tremodeux |
10:30-12:00 Session 42D: Track 6.4: Graph Algorithms and Linear Algebra
Chair:
Achim Basermann (German Aerospace Center (DLR), Simulation and Software Technology, Germany)
Location: BAR I88
10:30 | Uniform Dense Blocking for Efficient Sparse LU Factorization in First-principles Materials Simulation (abstract) PRESENTER: Chao Wang |
10:50 | Efficient Task Graph Scheduling for Parallel QR Factorization in SLSQP (abstract) |
11:10 | ScaleRunner: A Fast MPI-based Random Walk Engine for Multi-CPU Systems (abstract) PRESENTER: Florian Willich |
12:00-13:30Lunch Break
13:30-14:30 Session 43A: Track 2.5: Scheduling, Resource Management, Cloud, Edge Computing, and Workflows
Chair:
Martin Schulz (Technical University Munich, Germany)
Location: BAR 205
13:30 | Leveraging Expert Usage to Speed up LLM Inference with Expert Parallelism (abstract) PRESENTER: Olivier Beaumont |
13:50 | Priority-BF: a Task Manager for Priority-Based Scheduling (abstract) PRESENTER: Ana Gainaru |
14:10 | Green Scheduling on the Edge (abstract) PRESENTER: Joachim Cendrier |
13:30-14:30 Session 43B: Track 5.2: Theory and Algorithms
Chair:
Lester Kalms (TU Dresden, Germany)
Location: BAR 218
13:30 | Byzantine-Tolerant Consensus in GPU-Inspired Shared Memory (abstract) |
13:50 | Partitioning In-Place on Massively Parallel Systems (abstract) |
13:30-14:30 Session 43C: Track 6.5: GPU and Quantum Systems
Chair:
Florina M. Ciorba (University of Basel, Switzerland)
Location: BAR 106
13:30 | Disaggregated Design for GPU-Based Volumetric Data Structures (abstract) |
13:50 | Quantum Delta Encoding: Optimizing Data Storage on Quantum Computers with Resource Efficiency (abstract) PRESENTER: Jiale Zhang |
14:10 | SimPart: A Simple Yet Effective Replication-aided Partitioning Algorithm for Logic Simulation on GPU (abstract) PRESENTER: Yi-Hua Chung |