Days: Wednesday, August 28th Thursday, August 29th Friday, August 30th
View this program: with abstractssession overviewtalk overview
European supercomputers: buying versus building
In 2017, Europe created the EuroHPC initiative and its associated legal funding structure, the “EuroHPC JU” Joint Undertaking with two main objectives. The first objective is to acquire, build and deploy world-class high performance computing (HPC) infrastructure across Europe. The second objective is to conduct research and development to build HPC hardware manufactured in Europe, as well as the applications (software) that would run on future locally developed European supercomputers.This talk will cover both objectives in detail. On the one hand, Europe has recently committed a substantial amount of money to the first goal. For example, in the June 2024 Top-500 list, 9 of the Top-20 supercomputers are from Europe. We will go deeper and describe the two main components of the heterogeneous MareNostrum 5 supercomputer, listed separately in positions 8 and 22 of the June 2024 Top-500. Installed at our Barcelona site, MareNostrum 5 represents a good illustration of the challenges of building a contemporary supercomputer; for example, space requirements dictated that BSC could no longer implement it within our Church. Therefore, the MareNostrum 5 had to be installed in a larger space; while the Church will be used to install our first Quantum Computer, thus fulfilling the prophecy made by Dan Brown in his book "Origin".
On the other hand, and as the second part of my talk, I will describe the European approach to design general Made-in-Europe processors and accelerators leveraging the RISC V Open Instruction Set Architecture (ISA). Currently, this approach is embodied in a couple of large-scale European research projects, namely the European Processor Initiative, EUPilot, Eprocessor, as well as some nationally funded projects. I will briefly describe these projects, including the proof-of-concept chips that successfully boot Linux. I will briefly hint at the future and describe the initiatives that Europe and the BSC are pursuing with the main goal of developing software and hardware for the MareNostrum 6 supercomputer that should be a reality in 2027-2028.
11:00 | Bringing auto-tuning to HIP: Analysis of tuning impact and difficulty on AMD and Nvidia GPUs (Artifact) (abstract) PRESENTER: Stijn Heldens |
11:20 | A 1.25(1+ε)-Approximation Algorithm for Scheduling with Rejection Costs Proportional to Processing Times (Artifact) (abstract) ![]() PRESENTER: Esragul Korkmaz |
11:40 | (re)Assessing PiM Effectiveness for Sequence Alignment (abstract) PRESENTER: Matei Ripeanu |
12:00 | LogRCA: Log-based Root Cause Analysis for Distributed Services (abstract) PRESENTER: Thorsten Wittkopp |
12:20 | How to Relax Instantly: Elastic Relaxation of Concurrent Data Structures (Artifact) (abstract) PRESENTER: Kåre von Geijer |
12:40 | Code Generation for Octree-Based Multigrid Solvers with Fused Higher-Order Interpolation and Communication (abstract) PRESENTER: Richard Angersbach |
15:30 | Optimizing Communication for Latency Sensitive HPC Applications on up to 48 FPGAs Using ACCL (abstract) PRESENTER: Marius Meyer |
15:50 | A High-Performance Collective I/O Framework Leveraging Node-Local Persistent Memory (abstract) PRESENTER: Xuechen Zhang |
16:10 | Efficient RNIC Cache Side-channel Attack Detection through DPU-driven Architecture (abstract) PRESENTER: Yunkun Liao |
16:30 | Harnessing Data Movement Strategies to Optimize Performance-Energy Efficiency of Oil & Gas Simulations in HPC (abstract) PRESENTER: Arthur Lorenzon |
15:30 | Boolean Matrix Multiplication for Highly Clustered Data on the Congested Clique (abstract) |
15:50 | Accelerated Block-Sparsity-Aware Matrix Reordering for Leveraging Tensor Cores in Sparse Matrix-Multivector Multiplication (Artifact) (abstract) PRESENTER: Eunji Lee |
16:10 | Communication Minimizing Toom-Cook Algorithms (abstract) PRESENTER: Yuval Spiizer |
16:30 | Reduced-Precision and Reduced-Exponent Formats for Adaptive-Precision Sparse Matrix-Vector Product (abstract) PRESENTER: Roméo Molina |
15:30 | A Joint Approach to Local Updating and Gradient Compression for Efficient Asynchronous Federated Learning (abstract) PRESENTER: Jiajun Song |
15:50 | GPU Cache System for COMPSs: A Task-Based Distributed Computing Framework (abstract) PRESENTER: Cristian Tatu |
16:10 | Distributed Simulation for Digital Twins of Large-Scale Real-World DiffServ-Based Networks (abstract) PRESENTER: Nan Zhang |
16:30 | DF* PageRank: Incrementally Expanding Approaches for Updating PageRank on Dynamic Graphs (Artifact) (abstract) PRESENTER: Subhajit Sahu |
16:50 | Investigating Portability in Chapel for Tree-based Optimization on GPU-powered Clusters (abstract) PRESENTER: Tiago Carneiro |
15:30 | WActiGrad: Structured Pruning for Efficient Finetuning and Inference of Large Language Models on AI Accelerators (abstract) PRESENTER: Murali Emani |
15:50 | VeriChroma: Ownership Verification for Federated Models via RGB Filters (abstract) ![]() PRESENTER: Zhi Lu |
16:10 | Disttack: Graph Adversarial Attacks Toward Distributed GNN Training (abstract) PRESENTER: Yuxiang Zhang |
16:30 | Optimizing Federated Learning using Remote Embeddings for Graph Neural Networks (abstract) PRESENTER: Pranjal Naman |
16:50 | CSIMD: Cross-Search Algorithm with Improved Multi-Dimensional Dichotomy for Micro-batch-based Pipeline Parallel Training in DNN (abstract) PRESENTER: Guangyao Zhou |
View this program: with abstractssession overviewtalk overview
AuroraGPT: Rationale, Challenges and Development of an AI Research Assistant
Innovative methods, new instruments, disruptive techniques, and groundbreaking technologies have led to significant leaps in scientific progress. The increasingly powerful Large Language Models (LLMs) released each month have already sped up research activities such as concept explanation, literature search, and summarization. The transformative potential of AI in research activities, in particular, foundation models, raises important questions about their performance in science activities, their potential application in different contexts, and their ethics. In this talk, I will first explore the notion of AI research assistants and then discuss the gap between an ideal AI research assistant and the current LLMs, focusing on HPC and parallel computing research problems. The gap motivates the development of research-oriented LLMs. AuroraGPT is developed as an open foundation model trained specifically with scientific data to explore solutions toward the realization of effective AI research assistants. I will describe the activity, challenges, and progress of the different groups developing the key aspects of AuroraGPT. I will particularly focus on the critical and hard task of LLMs' scientific skills, safety, and trustworthiness evaluation.
10:30 | FakeGuard: A Novel Accelerator Architecture for Deepfake Detection Networks (abstract) PRESENTER: Xingbin Wang |
10:50 | ImSPU: Implicit Sharing of Computation Resources between Vector and Scalar Processing Units (abstract) PRESENTER: Hongbing Tan |
11:10 | ADE-HGNN: Accelerating HGNNs through Attention Disparity Exploitation (abstract) PRESENTER: Dengke Han |
11:30 | Fault tolerant in the Expand Ad-Hoc parallel file system (Artifact) (abstract) PRESENTER: Dario Muñoz-Muñoz |
11:50 | Parallel Writing of Nested Data in Columnar Formats (Artifact) (abstract) PRESENTER: Jonas Hahnfeld |
10:30 | Improving Generalization and Personalization in Long-Tailed Federated Learning via Classifier Retraining (abstract) PRESENTER: Yuhang Li |
10:50 | FLUK: Protecting Federated Learning against Malicious Clients for Internet of Vehicles (abstract) PRESENTER: Mengde Zhu |
11:10 | GDL-GNN: Applying GPU Dataloading of Large Datasets for Graph Neural Network Inference (abstract) PRESENTER: Haoran Dang |
11:30 | Quartet: A Holistic Hybrid Parallel Framework for Training Large Language Models (abstract) PRESENTER: Weigang Zhang |
11:50 | Inference with Transformer Encoders on ARM and RISC-V Multicore Processors (abstract) PRESENTER: Enrique S. Quintana-Orti |
10:30 | MPR: An MPI Framework for Distributed Self-Adaptive Stream Processing (abstract) PRESENTER: Júnior Löff |
10:50 | TaroRTL: Accelerating RTL Simulation using Coroutine-based Heterogeneous Task Graph Scheduling (abstract) PRESENTER: Tsung-Wei Huang |
11:10 | Combining Compression and Prefetching to Improve Checkpointing for Inverse Seismic Problems in GPUs (abstract) ![]() PRESENTER: Thiago Maltempi |
11:30 | Cloud-native GPU-enabled architecture for parallel video encoding (abstract) PRESENTER: Andoni Salcedo Navarro |
11:50 | VLASPH: Smoothed Particle Hydrodynamics on VLA SIMD Architectures (abstract) PRESENTER: Xiaokang Fan |
10:30 | Solving the Restricted Assignment Problem to Schedule Multi-Get Requests in Key-Value Stores (Artifact) (abstract) PRESENTER: Anthony Dugois |
10:50 | Resource-Aware Heterogeneous Federated Learning with Specialized Local Models (abstract) ![]() PRESENTER: Sixing Yu |
11:10 | Makespan Minimization for Scheduling on Heterogeneous Platforms with Precedence Constraints (abstract) PRESENTER: Giorgio Lucarelli |
11:30 | Deadline-driven Enhancements and Response Time Analysis of ROS2 Multi-threaded Executors (abstract) PRESENTER: Zhengda Wu |
11:50 | Light-weight prediction for improving energy consumption in HPC platforms (Artifact) (abstract) PRESENTER: Danilo Carastan-Santos |
13:30 | Efficient Coupling Streaming AI and Ensemble Simulations on HPC Clusters (abstract) PRESENTER: Dan Huang |
13:50 | sAirflow: Adopting Serverless in a Legacy Workflow Scheduler (abstract) PRESENTER: Paweł Żuk |
14:10 | Optimizing Service Replication and Placement for IoT Applications in Fog Computing Systems (abstract) PRESENTER: Farah Ait Salaht |
14:30 | Scheduling distributed I/O resources in HPC systems (abstract) PRESENTER: Alexis Bandet |
14:50 | Node Bundle Scheduling: An Ultra-Low Latency Traffic Scheduling Algorithm for TAS-based Time-Sensitive Networks (abstract) PRESENTER: Qian Yang |
13:30 | Compact Parallel Hash Tables on the GPU (Artifact) (abstract) PRESENTER: Steef Hegeman |
13:50 | Hybrid Congestion Control for BXI-based Interconnection Networks (abstract) PRESENTER: Gabriel Gomez-Lopez |
14:10 | Exploring processor micro-architectures optimised for BLAS3 micro-kernels (abstract) ![]() PRESENTER: Stepan Nassyr |
14:30 | Watt: A Write-optimized RRAM-based Accelerator for Attention (abstract) PRESENTER: Xuan Zhang |
13:30 | A Fast Wait-Free Solution to Read-Reclaim Races in Reference Counting (Artifact) (abstract) ![]() PRESENTER: Ivo Gabe de Wolff |
13:50 | QClique: Optimizing Performance and Accuracy in Maximum Weighted Clique (abstract) PRESENTER: Qasim Abbas |
14:10 | ALZI: An Improved Parallel Algorithm for Finding Connected Components in Large Graphs (abstract) PRESENTER: Maleq Khan |
14:30 | Mixed precision randomized low-rank approximation with GPU tensor cores (abstract) PRESENTER: Matthieu Robeyns |
14:50 | GPU-Accelerated BFS for Dynamic Networks (abstract) PRESENTER: Filippo Ziche |
13:30 | Deconstructing HPL-MxP benchmark: a numerical perspective (abstract) PRESENTER: Eric Petit |
13:50 | OMPGPT: A Generative Pre-trained Transformer Model for OpenMP (abstract) PRESENTER: Arijit Bhattacharjee |
14:10 | ImageMap: Enabling Efficient Mapping from Image Processing DSL to CGRA (abstract) PRESENTER: Bizhao Shi |
14:30 | Predicting GPU kernel's performance on upcoming architectures (abstract) PRESENTER: Lucas Van Lanker |
14:50 | A Mechanism to Generate Interception Based Tools for HPC Libraries (abstract) PRESENTER: Bengisu Elis |
16:00 | PEANUTS: A Persistent Memory-Based Network Unilateral Transfer System for Enhanced MPI-IO Data Transfer (Artifact) (abstract) PRESENTER: Kohei Hiraga |
16:20 | Asymmetric Coded Distributed Computation for Resilient Prediction Serving Systems (abstract) PRESENTER: Lin Wang |
16:40 | Athena: Add More Intelligence to RMT-based Network Data Plane with Low-bit Quantization (abstract) PRESENTER: Yunkun Liao |
17:00 | Lightweight Byzantine-Robust and Privacy-Preserving Federated Learning (abstract) ![]() PRESENTER: Zhi Lu |
17:20 | FedGG: Leveraging Generative Adversarial Networks and Gradient Smoothing for Privacy Protection in Federated Learning (abstract) PRESENTER: Shuchun Xu |
16:00 | Vectorizing Sparse Blocks of Graph Matrices for SpMV (abstract) PRESENTER: Yuang Chen |
16:20 | On the use of hybrid computing for accelerating EEG preprocessing (abstract) PRESENTER: L. Felipe Romero |
16:40 | AdapCK: Optimizing I/O for Checkpointing on Large-scale High Performance Computing Systems (abstract) PRESENTER: Jie Jia |
17:00 | Pipe-AGCM: A Fine-grain Pipelining Scheme for Optimizing the Parallel Atmospheric General Circulation Model (abstract) PRESENTER: Dazheng Liu |
16:00 | Hurry: Dynamic Collaborative Framework For Low-orbit Mega-Constellation Data Downloading (abstract) ![]() PRESENTER: Handong Luo |
16:20 | Context-aware Runtime Type Prediction for Heterogeneous Microservices (abstract) PRESENTER: Yibing Lin |
16:40 | PriCE: Privacy-Preserving and Cost-Effective Scheduling for Parallelizing the Large Medical Image Processing Workflow over Hybrid Clouds (abstract) PRESENTER: Yuandou Wang |
17:00 | EKRM: Efficient Key-Value Retrieval Method to Reduce Data Lookup Overhead for Redis (abstract) PRESENTER: Yiming Yao |
Did you know that Madrid was a relatively small town, practically unknown outside of Spain before the year 1561? The city’s fortunes changed that year when it burst upon the scene of European politics by becoming the permanent capital of Spain. The dynasty at the head of this change was known as the Habsburgs, a family that ruled the country and much of the known world from the 16th to the 18th century, and who were referred to in Spain as the House of Austria.
The historic center of Madrid was built up predominantly during the reign of that same dynasty, and this fascinating walking tour of Madrid de los Austrias, takes you through that area, giving you the best introduction to the Spanish capital.
View this program: with abstractssession overviewtalk overview
Bridging the Data Gaps to Democratize AI in Science, Education and Society
The democratization of Artificial Intelligence (AI) necessitates an ecosystem where data and research infrastructure are seamlessly integrated and universally accessible. This talk overviews the imperative of bridging the gaps between these components through robust services, facilitating an inclusive AI landscape that empowers diverse research communities and domains. The National Data Platform (NDP) aims to lower the barriers to entry for AI research and applications through an integrated services approach to streamline AI workflows, from data acquisition to model deployment. This approach underscores the importance of open, extensible, and equitable systems in driving forward the capabilities of AI, ultimately contributing to the resolution of grand scientific and societal challenges. Through examining real case studies leveraging open data platforms and scalable research infrastructure, the talk will highlight the role of composable systems and services in NDP to catalyze a platform to empower users from all backgrounds to engage in meaningful research, learning, and discovery.
10:30 | Making easier the life-cycle management of complex application workflows (abstract) |
11:10 | Pre-Scheduling of Affine Loops for HLS Pipelining (abstract) |
11:40 | Evaluation of CPU constraining mechanisms in the LHC ALICE experiment Grid (abstract) |
10:30 | Supporting HPC Centers: challenges, horror stories and best practices (abstract) |
11:00 | ParaTools Pro for E4S (abstract) |
11:30 | E4 at the forefront of European HPC (abstract) |
13:30 | Efficient Code Region Characterization through Automatic Performance Counters Reduction using Machine Learning Techniques (abstract) PRESENTER: Suren Harutyunyan Gevorgyan |
13:50 | FlexiGran: Flexible Granularity Locking in Hierarchies (abstract) PRESENTER: Anju Mongandampulath Akathoott |
14:10 | ESIMD GPU implementations of Deep Learning Sparse Matrix Kernels (abstract) PRESENTER: Christoph Bauinger |
13:30 | Accelerating Stencil Computation with Fully Homomorphic Encryption Using GPU (abstract) PRESENTER: Pei Li |
13:50 | A Framework for Automated Parallel Execution of Scientific Multi-Workflow Applications in the Cloud with Work Stealing (abstract) PRESENTER: Helena Schubert da Incarnacao Lima da Silva |
14:10 | Accelerating Large-Scale Sparse LU Factorization for RF Circuit Simulation (abstract) PRESENTER: Guofeng Feng |
13:30 | DProbe: Profiling and Predicting Multi-Tenant Deep Learning Workloads for GPU Resource Scaling (abstract) PRESENTER: Zechun Zhou |
13:50 | Towards High-Performance Transactions via Hierarchical Blockchain Sharding (abstract) PRESENTER: Haibo Tang |
14:10 | Automated Data Management and Learning-based Scheduling for Ray-based Hybrid HPC-Cloud Systems (abstract) PRESENTER: Tingkai Liu |
13:30 | PCTC: Hardware and Software Co-Design for Pruned Capsule Networks on Tensor Cores (abstract) PRESENTER: Ehsan Atoofian |
13:50 | A Folded Computation-in-Memory Accelerator for Fast Polynomial Multiplication in BIKE (abstract) ![]() PRESENTER: Zewen Ye |
14:10 | MEPAD: A Memory-efficient Parallelized Direct Convolution Algorithm for Deep Neural Networks (abstract) PRESENTER: Leandro Fiorin |