EURO-PAR 2024: 30TH INTERNATIONAL EUROPEAN CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING
PROGRAM

Days: Wednesday, August 28th Thursday, August 29th Friday, August 30th

Wednesday, August 28th

View this program: with abstractssession overviewtalk overview

09:30-10:30 Session 2: Keynote: Mateo Valero

European supercomputers: buying versus building 

In 2017, Europe created the EuroHPC initiative and its associated legal funding structure, the “EuroHPC JU” Joint Undertaking with two main objectives. The first objective is to acquire, build and deploy world-class high performance computing (HPC) infrastructure across Europe. The second objective is to conduct research and development to build HPC hardware manufactured in Europe, as well as the applications (software) that would run on future locally developed European supercomputers.This talk will cover both objectives in detail. On the one hand, Europe has recently committed a substantial amount of money to the first goal. For example, in the June 2024 Top-500 list, 9 of the Top-20 supercomputers are from Europe. We will go deeper and describe the two main components of the heterogeneous MareNostrum 5 supercomputer, listed separately in positions 8 and 22 of the June 2024 Top-500. Installed at our Barcelona site, MareNostrum 5 represents a good illustration of the challenges of building a contemporary supercomputer; for example, space requirements dictated that BSC could no longer implement it within our Church. Therefore, the MareNostrum 5 had to be installed in a larger space; while the Church will be used to install our first Quantum Computer, thus fulfilling the prophecy made by Dan Brown in his book "Origin".

On the other hand, and as the second part of my talk, I will describe the European approach to design general Made-in-Europe processors and accelerators leveraging the RISC V Open Instruction Set Architecture (ISA).  Currently, this approach is embodied in a couple of large-scale European research projects, namely the European Processor Initiative, EUPilot, Eprocessor, as well as some nationally funded projects. I will briefly describe these projects, including the proof-of-concept chips that successfully boot Linux. I will briefly hint at the future and describe the initiatives that Europe and the BSC are pursuing with the main goal of developing software and hardware for the MareNostrum 6 supercomputer that should be a reality in 2027-2028.

Location: Auditorium
10:30-11:00Coffee Break
11:00-13:00 Session 3: Best Paper Session
Location: Auditorium
11:00
Bringing auto-tuning to HIP: Analysis of tuning impact and difficulty on AMD and Nvidia GPUs (Artifact) (abstract)
PRESENTER: Stijn Heldens
11:20
A 1.25(1+ε)-Approximation Algorithm for Scheduling with Rejection Costs Proportional to Processing Times (Artifact) (abstract)
PRESENTER: Esragul Korkmaz
11:40
(re)Assessing PiM Effectiveness for Sequence Alignment (abstract)
PRESENTER: Matei Ripeanu
12:00
LogRCA: Log-based Root Cause Analysis for Distributed Services (abstract)
12:20
How to Relax Instantly: Elastic Relaxation of Concurrent Data Structures (Artifact) (abstract)
PRESENTER: Kåre von Geijer
12:40
Code Generation for Octree-Based Multigrid Solvers with Fused Higher-Order Interpolation and Communication (abstract)
13:00-14:00Lunch Break
15:00-15:30Coffee Break
15:30-17:30 Session 5A: Architectures and Accelerators (I)
Location: -1.A.04
15:30
Optimizing Communication for Latency Sensitive HPC Applications on up to 48 FPGAs Using ACCL (abstract)
PRESENTER: Marius Meyer
15:50
A High-Performance Collective I/O Framework Leveraging Node-Local Persistent Memory (abstract)
PRESENTER: Xuechen Zhang
16:10
Efficient RNIC Cache Side-channel Attack Detection through DPU-driven Architecture (abstract)
PRESENTER: Yunkun Liao
16:30
Harnessing Data Movement Strategies to Optimize Performance-Energy Efficiency of Oil & Gas Simulations in HPC (abstract)
PRESENTER: Arthur Lorenzon
15:30-17:30 Session 5B: Theory and Algorithms (I)
Location: -1.A.07
15:30
Boolean Matrix Multiplication for Highly Clustered Data on the Congested Clique (abstract)
15:50
Accelerated Block-Sparsity-Aware Matrix Reordering for Leveraging Tensor Cores in Sparse Matrix-Multivector Multiplication (Artifact) (abstract)
PRESENTER: Eunji Lee
16:10
Communication Minimizing Toom-Cook Algorithms (abstract)
PRESENTER: Yuval Spiizer
16:30
Reduced-Precision and Reduced-Exponent Formats for Adaptive-Precision Sparse Matrix-Vector Product (abstract)
PRESENTER: Roméo Molina
15:30-17:30 Session 5C: Multidisciplinary, Domain-Specific and Applied Parallel and Distributed Computing (I)
Location: -1.A.01
15:30
A Joint Approach to Local Updating and Gradient Compression for Efficient Asynchronous Federated Learning (abstract)
PRESENTER: Jiajun Song
15:50
GPU Cache System for COMPSs: A Task-Based Distributed Computing Framework (abstract)
PRESENTER: Cristian Tatu
16:10
Distributed Simulation for Digital Twins of Large-Scale Real-World DiffServ-Based Networks (abstract)
PRESENTER: Nan Zhang
16:30
DF* PageRank: Incrementally Expanding Approaches for Updating PageRank on Dynamic Graphs (Artifact) (abstract)
PRESENTER: Subhajit Sahu
16:50
Investigating Portability in Chapel for Tree-based Optimization on GPU-powered Clusters (abstract)
PRESENTER: Tiago Carneiro
15:30-17:30 Session 5D: Data analytics, AI, and Computational Science (I)
Location: -1.A.06
15:30
WActiGrad: Structured Pruning for Efficient Finetuning and Inference of Large Language Models on AI Accelerators (abstract)
PRESENTER: Murali Emani
15:50
VeriChroma: Ownership Verification for Federated Models via RGB Filters (abstract)
PRESENTER: Zhi Lu
16:10
Disttack: Graph Adversarial Attacks Toward Distributed GNN Training (abstract)
PRESENTER: Yuxiang Zhang
16:30
Optimizing Federated Learning using Remote Embeddings for Graph Neural Networks (abstract)
PRESENTER: Pranjal Naman
16:50
CSIMD: Cross-Search Algorithm with Improved Multi-Dimensional Dichotomy for Micro-batch-based Pipeline Parallel Training in DNN (abstract)
PRESENTER: Guangyao Zhou
Thursday, August 29th

View this program: with abstractssession overviewtalk overview

09:00-10:00 Session 6: Keynote: Franck Cappello

AuroraGPT: Rationale, Challenges and Development of an AI Research Assistant

Innovative methods, new instruments, disruptive techniques, and groundbreaking technologies have led to significant leaps in scientific progress. The increasingly powerful Large Language Models (LLMs) released each month have already sped up research activities such as concept explanation, literature search, and summarization. The transformative potential of AI in research activities, in particular, foundation models, raises important questions about their performance in science activities, their potential application in different contexts, and their ethics.  In this talk, I will first explore the notion of AI research assistants and then discuss the gap between an ideal AI research assistant and the current LLMs, focusing on HPC and parallel computing research problems. The gap motivates the development of research-oriented LLMs. AuroraGPT is developed as an open foundation model trained specifically with scientific data to explore solutions toward the realization of effective AI research assistants. I will describe the activity, challenges, and progress of the different groups developing the key aspects of AuroraGPT. I will particularly focus on the critical and hard task of LLMs' scientific skills,  safety, and trustworthiness evaluation.

Location: Auditorium
10:00-10:30Coffee Break
10:30-12:30 Session 7A: Architectures and Accelerators (II)
Location: -1.A.04
10:30
FakeGuard: A Novel Accelerator Architecture for Deepfake Detection Networks (abstract)
PRESENTER: Xingbin Wang
10:50
ImSPU: Implicit Sharing of Computation Resources between Vector and Scalar Processing Units (abstract)
PRESENTER: Hongbing Tan
11:10
ADE-HGNN: Accelerating HGNNs through Attention Disparity Exploitation (abstract)
PRESENTER: Dengke Han
11:30
Fault tolerant in the Expand Ad-Hoc parallel file system (Artifact) (abstract)
11:50
Parallel Writing of Nested Data in Columnar Formats (Artifact) (abstract)
PRESENTER: Jonas Hahnfeld
10:30-12:30 Session 7B: Data analytics, AI, and Computational Science (II)
Location: -1.A.06
10:30
Improving Generalization and Personalization in Long-Tailed Federated Learning via Classifier Retraining (abstract)
PRESENTER: Yuhang Li
10:50
FLUK: Protecting Federated Learning against Malicious Clients for Internet of Vehicles (abstract)
PRESENTER: Mengde Zhu
11:10
GDL-GNN: Applying GPU Dataloading of Large Datasets for Graph Neural Network Inference (abstract)
PRESENTER: Haoran Dang
11:30
Quartet: A Holistic Hybrid Parallel Framework for Training Large Language Models (abstract)
PRESENTER: Weigang Zhang
11:50
Inference with Transformer Encoders on ARM and RISC-V Multicore Processors (abstract)
10:30-12:30 Session 7C: Multidisciplinary, Domain-Specific and Applied Parallel and Distributed Computing (II)
Location: Auditorium
10:30
MPR: An MPI Framework for Distributed Self-Adaptive Stream Processing (abstract)
PRESENTER: Júnior Löff
10:50
TaroRTL: Accelerating RTL Simulation using Coroutine-based Heterogeneous Task Graph Scheduling (abstract)
PRESENTER: Tsung-Wei Huang
11:10
Combining Compression and Prefetching to Improve Checkpointing for Inverse Seismic Problems in GPUs (abstract)
PRESENTER: Thiago Maltempi
11:30
Cloud-native GPU-enabled architecture for parallel video encoding (abstract)
11:50
VLASPH: Smoothed Particle Hydrodynamics on VLA SIMD Architectures (abstract)
PRESENTER: Xiaokang Fan
10:30-12:30 Session 7D: Scheduling, Resource Management, Cloud, Edge Computing, and Workflows (I)
Location: -1.A.01
10:30
Solving the Restricted Assignment Problem to Schedule Multi-Get Requests in Key-Value Stores (Artifact) (abstract)
PRESENTER: Anthony Dugois
10:50
Resource-Aware Heterogeneous Federated Learning with Specialized Local Models (abstract)
PRESENTER: Sixing Yu
11:10
Makespan Minimization for Scheduling on Heterogeneous Platforms with Precedence Constraints (abstract)
11:30
Deadline-driven Enhancements and Response Time Analysis of ROS2 Multi-threaded Executors (abstract)
PRESENTER: Zhengda Wu
11:50
Light-weight prediction for improving energy consumption in HPC platforms (Artifact) (abstract)
12:30-13:30Lunch Break
13:30-15:30 Session 8A: Scheduling, Resource Management, Cloud, Edge Computing, and Workflows (II)
Location: -1.A.01
13:30
Efficient Coupling Streaming AI and Ensemble Simulations on HPC Clusters (abstract)
PRESENTER: Dan Huang
13:50
sAirflow: Adopting Serverless in a Legacy Workflow Scheduler (abstract)
PRESENTER: Paweł Żuk
14:10
Optimizing Service Replication and Placement for IoT Applications in Fog Computing Systems (abstract)
PRESENTER: Farah Ait Salaht
14:30
Scheduling distributed I/O resources in HPC systems (abstract)
PRESENTER: Alexis Bandet
14:50
Node Bundle Scheduling: An Ultra-Low Latency Traffic Scheduling Algorithm for TAS-based Time-Sensitive Networks (abstract)
PRESENTER: Qian Yang
13:30-15:30 Session 8B: Architectures and Accelerators (III)
Location: -1.A.05
13:30
Compact Parallel Hash Tables on the GPU (Artifact) (abstract)
PRESENTER: Steef Hegeman
13:50
Hybrid Congestion Control for BXI-based Interconnection Networks (abstract)
14:10
Exploring processor micro-architectures optimised for BLAS3 micro-kernels (abstract)
PRESENTER: Stepan Nassyr
14:30
Watt: A Write-optimized RRAM-based Accelerator for Attention (abstract)
PRESENTER: Xuan Zhang
13:30-15:30 Session 8C: Theory and Algorithms (II)
Location: -1.A.06
13:30
A Fast Wait-Free Solution to Read-Reclaim Races in Reference Counting (Artifact) (abstract)
13:50
QClique: Optimizing Performance and Accuracy in Maximum Weighted Clique (abstract)
PRESENTER: Qasim Abbas
14:10
ALZI: An Improved Parallel Algorithm for Finding Connected Components in Large Graphs (abstract)
PRESENTER: Maleq Khan
14:30
Mixed precision randomized low-rank approximation with GPU tensor cores (abstract)
PRESENTER: Matthieu Robeyns
14:50
GPU-Accelerated BFS for Dynamic Networks (abstract)
PRESENTER: Filippo Ziche
13:30-15:30 Session 8D: Programming, Compilers and Performance (II)
Location: -1.A.04
13:30
Deconstructing HPL-MxP benchmark: a numerical perspective (abstract)
PRESENTER: Eric Petit
13:50
OMPGPT: A Generative Pre-trained Transformer Model for OpenMP (abstract)
14:10
ImageMap: Enabling Efficient Mapping from Image Processing DSL to CGRA (abstract)
PRESENTER: Bizhao Shi
14:30
Predicting GPU kernel's performance on upcoming architectures (abstract)
PRESENTER: Lucas Van Lanker
14:50
A Mechanism to Generate Interception Based Tools for HPC Libraries (abstract)
PRESENTER: Bengisu Elis
15:30-16:00Coffee Break
16:00-17:30 Session 9A: Data analytics, AI, and Computational Science (III)
Location: -1.A.06
16:00
PEANUTS: A Persistent Memory-Based Network Unilateral Transfer System for Enhanced MPI-IO Data Transfer (Artifact) (abstract)
PRESENTER: Kohei Hiraga
16:20
Asymmetric Coded Distributed Computation for Resilient Prediction Serving Systems (abstract)
PRESENTER: Lin Wang
16:40
Athena: Add More Intelligence to RMT-based Network Data Plane with Low-bit Quantization (abstract)
PRESENTER: Yunkun Liao
17:00
Lightweight Byzantine-Robust and Privacy-Preserving Federated Learning (abstract)
PRESENTER: Zhi Lu
17:20
FedGG: Leveraging Generative Adversarial Networks and Gradient Smoothing for Privacy Protection in Federated Learning (abstract)
PRESENTER: Shuchun Xu
16:00-17:30 Session 9B: Multidisciplinary, Domain-Specific and Applied Parallel and Distributed Computing (III)
Location: -1.A.01
16:00
Vectorizing Sparse Blocks of Graph Matrices for SpMV (abstract)
PRESENTER: Yuang Chen
16:20
On the use of hybrid computing for accelerating EEG preprocessing (abstract)
PRESENTER: L. Felipe Romero
16:40
AdapCK: Optimizing I/O for Checkpointing on Large-scale High Performance Computing Systems (abstract)
PRESENTER: Jie Jia
17:00
Pipe-AGCM: A Fine-grain Pipelining Scheme for Optimizing the Parallel Atmospheric General Circulation Model (abstract)
PRESENTER: Dazheng Liu
16:00-17:30 Session 9C: Scheduling, Resource Management, Cloud, Edge Computing, and Workflows (III)
Location: -1.A.05
16:00
Hurry: Dynamic Collaborative Framework For Low-orbit Mega-Constellation Data Downloading (abstract)
PRESENTER: Handong Luo
16:20
Context-aware Runtime Type Prediction for Heterogeneous Microservices (abstract)
PRESENTER: Yibing Lin
16:40
PriCE: Privacy-Preserving and Cost-Effective Scheduling for Parallelizing the Large Medical Image Processing Workflow over Hybrid Clouds (abstract)
PRESENTER: Yuandou Wang
17:00
EKRM: Efficient Key-Value Retrieval Method to Reduce Data Lookup Overhead for Redis (abstract)
PRESENTER: Yiming Yao
18:45-20:00 Walking Tour

Did you know that Madrid was a relatively small town, practically unknown outside of Spain before the year 1561?  The city’s fortunes changed that year when it burst upon the scene of European politics by becoming the permanent capital of Spain.  The dynasty at the head of this change was known as the Habsburgs, a family that ruled the country and much of the known world from the 16th to the 18th century, and who were referred to in Spain as the House of Austria.

The historic center of Madrid was built up predominantly during the reign of that same dynasty, and this fascinating walking tour of Madrid de los Austrias, takes you through that area, giving you the best introduction to the Spanish capital.

Friday, August 30th

View this program: with abstractssession overviewtalk overview

09:00-10:00 Session 10: Keynote: İlkay Altıntaş

Bridging the Data Gaps to Democratize AI in Science, Education and Society

The democratization of Artificial Intelligence (AI) necessitates an ecosystem where data and research infrastructure are seamlessly integrated and universally accessible. This talk overviews the imperative of bridging the gaps between these components through robust services, facilitating an inclusive AI landscape that empowers diverse research communities and domains. The National Data Platform (NDP) aims to lower the barriers to entry for AI research and applications through an integrated services approach to streamline AI workflows, from data acquisition to model deployment. This approach underscores the importance of open, extensible, and equitable systems in driving forward the capabilities of AI, ultimately contributing to the resolution of grand scientific and societal challenges. Through examining real case studies leveraging open data platforms and scalable research infrastructure, the talk will highlight the role of composable systems and services in NDP to catalyze a platform to empower users from all backgrounds to engage in meaningful research, learning, and discovery. 

Location: Auditorium
10:00-10:30Coffee Break
10:30-12:30 Session 11A: WHPC Session
Location: Auditorium
10:30
Making easier the life-cycle management of complex application workflows (abstract)
11:10
Pre-Scheduling of Affine Loops for HLS Pipelining (abstract)
11:40
Evaluation of CPU constraining mechanisms in the LHC ALICE experiment Grid (abstract)
10:30-12:30 Session 11B: Industrial Session
Location: -1.A.01
10:30
Supporting HPC Centers: challenges, horror stories and best practices (abstract)
11:00
ParaTools Pro for E4S (abstract)
11:30
E4 at the forefront of European HPC (abstract)
12:30-13:30Lunch Break
13:30-14:50 Session 12A: Programming, Compilers and Performance (I)
Location: -1.A.05
13:30
Efficient Code Region Characterization through Automatic Performance Counters Reduction using Machine Learning Techniques (abstract)
13:50
FlexiGran: Flexible Granularity Locking in Hierarchies (abstract)
14:10
ESIMD GPU implementations of Deep Learning Sparse Matrix Kernels (abstract)
13:30-14:50 Session 12B: Multidisciplinary, Domain-Specific and Applied Parallel and Distributed Computing (IV)
Location: -1.A.01
13:30
Accelerating Stencil Computation with Fully Homomorphic Encryption Using GPU (abstract)
PRESENTER: Pei Li
13:50
A Framework for Automated Parallel Execution of Scientific Multi-Workflow Applications in the Cloud with Work Stealing (abstract)
14:10
Accelerating Large-Scale Sparse LU Factorization for RF Circuit Simulation (abstract)
PRESENTER: Guofeng Feng
13:30-14:50 Session 12C: Scheduling, Resource Management, Cloud, Edge Computing, and Workflows (IV)
Location: -1.A.06
13:30
DProbe: Profiling and Predicting Multi-Tenant Deep Learning Workloads for GPU Resource Scaling (abstract)
PRESENTER: Zechun Zhou
13:50
Towards High-Performance Transactions via Hierarchical Blockchain Sharding (abstract)
PRESENTER: Haibo Tang
14:10
Automated Data Management and Learning-based Scheduling for Ray-based Hybrid HPC-Cloud Systems (abstract)
PRESENTER: Tingkai Liu
13:30-14:50 Session 12D: Architectures and Accelerators (IV)
Location: -1.A.04
13:30
PCTC: Hardware and Software Co-Design for Pruned Capsule Networks on Tensor Cores (abstract)
PRESENTER: Ehsan Atoofian
13:50
A Folded Computation-in-Memory Accelerator for Fast Polynomial Multiplication in BIKE (abstract)
PRESENTER: Zewen Ye
14:10
MEPAD: A Memory-efficient Parallelized Direct Convolution Algorithm for Deep Neural Networks (abstract)
PRESENTER: Leandro Fiorin