Days: Wednesday, August 28th Thursday, August 29th Friday, August 30th

Wednesday, August 28th

View this program: with abstractssession overviewtalk overview

09:30-10:30 Session 1: Keynote: Mateo Valero

European supercomputers: buying versus building 

In 2017, Europe created the EuroHPC initiative and its associated legal funding structure, the “EuroHPC JU” Joint Undertaking with two main objectives. The first objective is to acquire, build and deploy world-class high performance computing (HPC) infrastructure across Europe. The second objective is to conduct research and development to build HPC hardware manufactured in Europe, as well as the applications (software) that would run on future locally developed European supercomputers.This talk will cover both objectives in detail. On the one hand, Europe has recently committed a substantial amount of money to the first goal. For example, in the June 2024 Top-500 list, 9 of the Top-20 supercomputers are from Europe. We will go deeper and describe the two main components of the heterogeneous MareNostrum 5 supercomputer, listed separately in positions 8 and 22 of the June 2024 Top-500. Installed at our Barcelona site, MareNostrum 5 represents a good illustration of the challenges of building a contemporary supercomputer; for example, space requirements dictated that BSC could no longer implement it within our Church. Therefore, the MareNostrum 5 had to be installed in a larger space; while the Church will be used to install our first Quantum Computer, thus fulfilling the prophecy made by Dan Brown in his book "Origin".

On the other hand, and as the second part of my talk, I will describe the European approach to design general Made-in-Europe processors and accelerators leveraging the RISC V Open Instruction Set Architecture (ISA).  Currently, this approach is embodied in a couple of large-scale European research projects, namely the European Processor Initiative, EUPilot, Eprocessor, as well as some nationally funded projects. I will briefly describe these projects, including the proof-of-concept chips that successfully boot Linux. I will briefly hint at the future and describe the initiatives that Europe and the BSC are pursuing with the main goal of developing software and hardware for the MareNostrum 6 supercomputer that should be a reality in 2027-2028.

Location: Auditorium
10:30-11:00Coffee Break
11:00-13:00 Session 2: Best Paper Candidates Session
Location: Auditorium
Milo Lurati (VU Amsterdam, Netherlands)
Stijn Heldens (Netherlands eScience Center, Netherlands)
Alessio Sclocco (Netherlands eScience Center, Netherlands)
Ben van Werkhoven (Leiden University, Netherlands)
Bringing auto-tuning to HIP: Analysis of tuning impact and difficulty on AMD and Nvidia GPUs (Artifact) (abstract)
Olivier Beaumont (Univ. Bordeaux, CNRS, Bordeaux INP, Inria, LaBRI, UMR 5800, Talence, France, France)
Rémi Bouzel (Qarnot Computing, Montrouge, France, France)
Lionel Eyraud-Dubois (Univ. Bordeaux, CNRS, Bordeaux INP, Inria, LaBRI, UMR 5800, Talence, France, France)
Esragul Korkmaz (Univ. Bordeaux, CNRS, Bordeaux INP, Inria, LaBRI, UMR 5800, Talence, France, France)
Laercio Pilla (Univ. Bordeaux, CNRS, Bordeaux INP, Inria, LaBRI, UMR 5800, Talence, France, France)
Alexandre Van Kempen (Qarnot Computing, Montrouge, France, France)
A 1.25(1+ε)-Approximation Algorithm for Scheduling with Rejection Costs Proportional to Processing Times (Artifact) (abstract)
PRESENTER: Esragul Korkmaz
Hamidreza Ramezanikebrya (University of British Columbia, Canada)
Matei Ripeanu (University of British Columbia, Canada)
(re)Assessing PiM Effectiveness for Sequence Alignment (abstract)
PRESENTER: Matei Ripeanu
Thorsten Wittkopp (Technische Universität Berlin, Germany)
Philipp Wiesner (Technische Universität Berlin, Germany)
Odej Kao (Technische Universität Berlin, Germany)
LogRCA: Log-based Root Cause Analysis for Distributed Services (abstract)
Kåre von Geijer (Chalmers University of Technology, Sweden)
Philippas Tsigas (Chalmers University of Technology, Sweden)
How to Relax Instantly: Elastic Relaxation of Concurrent Data Structures (Artifact) (abstract)
PRESENTER: Kåre von Geijer
Richard Angersbach (Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany)
Sebastian Kuckuk (Friedrich-Alexander-Universität Erlangen-Nürnberg, National High Performance Computing Center, Germany)
Harald Köstler (Friedrich-Alexander-Universität Erlangen-Nürnberg, National High Performance Computing Center, Germany)
Code Generation for Octree-Based Multigrid Solvers with Fused Higher-Order Interpolation and Communication (abstract)
13:00-14:00Lunch Break
15:00-15:30Coffee Break
15:30-17:30 Session 4A: Architectures and Accelerators (I)
Location: -1.A.02
Marius Meyer (Paderborn University, Germany)
Tobias Kenter (Paderborn University, Germany)
Kenneth O'Brien (AMD Research, Ireland)
Lucian Petrica (AMD Research, Ireland)
Michaela Blott (AMD Research, Ireland)
Christian Plessl (Paderborn University, Germany)
Optimizing Communication for Latency Sensitive HPC Applications on up to 48 FPGAs Using ACCL (abstract)
PRESENTER: Marius Meyer
Keegan Sanchez (Washington State University Vancouver, United States)
Alex Gavin (Washington State University Vancouver, United States)
Suren Byna (The Ohio State University, United States)
Kesheng Wu (Lawrence Berkeley National Laboratory, United States)
Xuechen Zhang (Washington State University Vancouver, United States)
A High-Performance Collective I/O Framework Leveraging Node-Local Persistent Memory (abstract)
PRESENTER: Keegan Sanchez
Yunkun Liao (SKLP, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences; Zhongguancun Laboratory;, China)
Jingya Wu (SKLP, Institute of Computing Technology, CAS, China)
Wenyan Lu (SKLP, Institute of Computing Technology, CAS; YUSUR Tech Co., Ltd, China)
Xiaowei Li (SKLP, Institute of Computing Technology, CAS; Zhongguancun Laboratory, China)
Guihai Yan (SKLP, Institute of Computing Technology, CAS; YUSUR Tech Co., Ltd, China)
Efficient RNIC Cache Side-channel Attack Detection through DPU-driven Architecture (abstract)
PRESENTER: Yunkun Liao
Pedro Rigon (Institute of Informatics - UFRGS, Brazil)
Brenda Schussler (Institute of Informatics - UFRGS, Brazil)
Alexandre Sardinha (Petrobras, Brazil)
Pedro Mario Silva (NVIDIA, Brazil)
Fábio Alves de Oliveira (NVIDIA, Brazil)
Alexandre Carissimi (INF/UFRGS, Brazil)
Jairo Panetta (ITA, Brazil)
Arthur Lorenzon (Federal University of Rio Grande do Sul, Brazil)
Philippe Navaux (UFRGS, Brazil)
Harnessing Data Movement Strategies to Optimize Performance-Energy Efficiency of Oil & Gas Simulations in HPC (abstract)
PRESENTER: Arthur Lorenzon
15:30-17:30 Session 4B: Theory and Algorithms (I)
Location: -1.A.03
Andrzej Lingas (Lund University, Sweden)
Boolean Matrix Multiplication for Highly Clustered Data on the Congested Clique (abstract)
Eunji Lee (Sogang University, South Korea)
Yoonsang Han (Sogang University, South Korea)
Gordon Moon (Sogang University, South Korea)
Accelerated Block-Sparsity-Aware Matrix Reordering for Leveraging Tensor Cores in Sparse Matrix-Multivector Multiplication (Artifact) (abstract)
Roy Nissim (The Hebrew University of Jerusalem, Israel)
Oded Schwartz (The Hebrew University, Israel)
Yuval Spiizer (Tel-Aviv University, Israel)
Communication Minimizing Toom-Cook Algorithms (abstract)
PRESENTER: Yuval Spiizer
Stef Graillat (Sorbonne Université, CNRS, LIP6, Paris, France, France)
Fabienne Jézéquel (Sorbonne Université, CNRS, LIP6, Paris, France, France)
Théo Mary (Sorbonne Université, CNRS, LIP6, Paris, France, France)
Roméo Molina (Sorbonne Université, CNRS, LIP6, Paris, France, France)
Daichi Mukunoki (--, Japan)
Reduced-Precision and Reduced-Exponent Formats for Adaptive-Precision Sparse Matrix-Vector Product (abstract)
PRESENTER: Roméo Molina
15:30-17:30 Session 4C: Multidisciplinary, Domain-Specific and Applied Parallel and Distributed Computing (I)
Location: -1.A.04
Jiajun Song (Tsinghua University, China)
Jiajun Luo (Southern University of Science and Technology, China)
Rongwei Lu (Tsinghua University, China)
Shuzhao Xie (Tsinghua University, China)
Bin Chen (Harbin Institute of Technology, Shenzhen, China)
Zhi Wang (Tsinghua University, China)
A Joint Approach to Local Updating and Gradient Compression for Efficient Asynchronous Federated Learning (abstract)
PRESENTER: Jiajun Song
Cristian Tatu (Barcelona Supercomputing Center, Spain)
Javier Conejero (Barcelona Supercomputing Center, Spain)
Fernando Vazquez (Barcelona Supercomputing Center, Spain)
Rosa M. Badia (Barcelona Supercomputing Center, Spain)
GPU Cache System for COMPSs: A Task-Based Distributed Computing Framework (abstract)
PRESENTER: Cristian Tatu
Zhuoyao Huang (State Key Laboratory of Mobile Network and Mobile Multimedia Technology; ZTE Co., Ltd, China)
Nan Zhang (Southern University of Science and Technology (SUSTech), Shenzhen, China, China)
Jingran Shen (Southern University of Science and Technology (SUSTech), Shenzhen, China, China)
Georgios Diamantopoulos (Southern University of Science and Technology (SUSTech), Shenzhen, China; University of Birmingham, Birmingham, UK, China)
Zhengchang Hua (Southern University of Science and Technology (SUSTech), Shenzhen, China; University of Leeds, Leeds, UK, China)
Nikos Tziritas (University of Thessaly, Volos, Greece, Greece)
Georgios Theodoropoulos (SUSTech, China)
Distributed Simulation for Digital Twins of Large-Scale Real-World DiffServ-Based Networks (abstract)
Subhajit Sahu (International Institute of Information Technology Hyderabad, India, India)
Kishore Kothapalli (International Institute of Information Technology Hyderabad, India, India)
Hemalatha Eedi (JNTUH College of Engineering Hyderabad, India, India)
Sathya Peri (Indian Institute of Technology Hyderabad, India, India)
DF* PageRank: Incrementally Expanding Approaches for Updating PageRank on Dynamic Graphs (Artifact) (abstract)
PRESENTER: Subhajit Sahu
Tiago Carneiro (IMEC, Belgium)
Engin Kayraklioglu (HPE, United States)
Guillaume Helbecque (university of lille, France)
Nouredine Melab (INRIA Lille, France)
Investigating Portability in Chapel for Tree-based Optimization on GPU-powered Clusters (abstract)
PRESENTER: Tiago Carneiro
15:30-17:30 Session 4D: Data analytics, AI, and Computational Science (I)
Location: -1.A.06
Krishna Teja Chitty-Venkata (Argonne National Laboratory, United States)
Sanjif Shanmugavelu (Groq Inc., UK)
Varuni Katti Sastry (Argonne National Laboratory, United States)
Murali Emani (Argonne National Laboratory, United States)
Venkatram Vishwanath (Argonne National Laboratory, United States)
Sylvia Howland (Cerebras Systems, United States)
WActiGrad: Structured Pruning for Efficient Finetuning and Inference of Large Language Models on AI Accelerators (abstract)
PRESENTER: Murali Emani
Hewang Nie (School of Cyber Science and Engineering, Huazhong university of Science and Technology, China)
Songfeng Lu (School of Cyber Science and Engineering, Huazhong university of Science and Technology, China)
Mu Wang (Clinical Research Institute, Affiliated South China Hospital, University of South China, China)
Jue Xiao (School of Cyber Science and Engineering, Huazhong university of Science and Technology, China)
Zhi Lu (School of Cyber Science and Engineering, Huazhong university of Science and Technology, China)
Zepu Yi (School of Cyber and Space Security, Huazhong University of Science and Technology, China)
VeriChroma: Ownership Verification for Federated Models via RGB Filters (abstract)
Yuxiang Zhang (SKLP, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, China)
Xin Liu (SKLP, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, China)
Meng Wu (SKLP, Institute of Computing Technology, Chinese Academy of Sciences, China)
Mingyu Yan (SKLP, Institute of Computing Technology, Chinese Academy of Sciences, China)
Wei Yan (SKLP, Institute of Computing Technology, Chinese Academy of Sciences; Zhongguancun Laboratory, China)
Xiaochun Ye (SKLP, Institute of Computing Technology, Chinese Academy of Sciences, China)
Dongrui Fan (SKLP, Institute of Computing Technology, Chinese Academy of Sciences, China)
Disttack: Graph Adversarial Attacks Toward Distributed GNN Training (abstract)
PRESENTER: Yuxiang Zhang
Pranjal Naman (Indian Institute of Science, India)
Yogesh Simmhan (Indian Institute of Science, India)
Optimizing Federated Learning using Remote Embeddings for Graph Neural Networks (abstract)
PRESENTER: Pranjal Naman
Guangyao Zhou (University of Electronic Science and Technology of China, China)
Haocheng Lan (University of Electronic Science and Technology of China, China)
Yuanlun Xie (University of Electronic Science and Technology of China, China)
Wenhong Tian (University of Electronic Science and Technology of China, China)
Jiahong Qian (Huawei Technologies Co. Ltd, China)
Teng Su (Huawei Technologies Co. Ltd, China)
CSIMD: Cross-Search Algorithm with Improved Multi-Dimensional Dichotomy for Micro-batch-based Pipeline Parallel Training in DNN (abstract)
PRESENTER: Guangyao Zhou
Thursday, August 29th

View this program: with abstractssession overviewtalk overview

10:00-10:30Coffee Break
10:30-12:30 Session 6A: Architectures and Accelerators (II)
Location: -1.A.02
Xingbin Wang (Institute of Information Engineering, Chinese Academy of Sciences, China)
Dan Meng (Institute of Information Engineering, Chinese Academy of Sciences, China)
Rui Hou (Institute of Information Engineering, Chinese Academy of Sciences, China)
FakeGuard: A Novel Accelerator Architecture for Deepfake Detection Networks (abstract)
PRESENTER: Xingbin Wang
Hongbing Tan (National University of Defense Technology, China)
Xiaowei He (National University of Defense Technology, China)
Libo Huang (National University of Defense Technology, China)
Guichu Sun (National University of Defense Technology, China)
Yuanhu Cheng (National University of Defense Technology, China)
Jing Zhang (National University of Defense Technology, China)
Zhong Zheng (National University of Defense Technology, China)
Quan Deng (National University of Defense Technology, China)
Bingcai Sui (National University of Defense Technology, China)
Yongwen Wang (National University of Defense Technology, China)
Liquan Xiao (National University of Defense Technology, China)
ImSPU: Implicit Sharing of Computation Resources between Vector and Scalar Processing Units (abstract)
Dengke Han (State Key Lab of Processors, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences, China)
Meng Wu (Institute of Computing Technology, Chinese Academy of Sciences, China)
Runzhen Xue (State Key Lab of Processors, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences, China)
Mingyu Yan (Institute of Computing Technology, Chinese Academy of Sciences, China)
Xiaochun Ye (Institute of Computing Technology, Chinese Academy of Sciences, China)
Dongrui Fan (Institute of Computing Technology, Chinese Academy of Sciences, China)
ADE-HGNN: Accelerating HGNNs through Attention Disparity Exploitation (abstract)
Dario Muñoz-Muñoz (Universidad Carlos III de Madrid, Spain)
Félix García-Carballeira (Universidad Carlos III de Madrid, Spain)
Diego Camarmas-Alonso (Universidad Carlos III de Madrid, Spain)
Alejandro Calderón-Mateos (Universidad Carlos III de Madrid, Spain)
Jesús Carretero (Universidad Carlos III de Madrid, Spain)
Fault tolerant in the Expand Ad-Hoc parallel file system (Artifact) (abstract)
Jonas Hahnfeld (CERN, Switzerland)
Jakob Blomer (CERN, Switzerland)
Thorsten Kollegger (Goethe University Frankfurt, Germany)
Parallel Writing of Nested Data in Columnar Formats (Artifact) (abstract)
PRESENTER: Jonas Hahnfeld
10:30-12:30 Session 6B: Data analytics, AI, and Computational Science (II)
Location: -1.A.06
Yuhang Li (Shanghai University, China)
Tong Liu (Shanghai University, China)
Wenfeng Shen (Shanghai Polytechnic University, China)
Yangguang Cui (Shanghai University, China)
Weijia Lu (United Automotive Electronic Systems, China)
Improving Generalization and Personalization in Long-Tailed Federated Learning via Classifier Retraining (abstract)
Mengde Zhu (Beijing University of Posts and Telecommunications, China)
Wanyi Ning (Beijing University of Posts and Telecommunications, China)
Qi Qi (Beijing University of Posts and Telecommunications, China)
Jingyu Wang (Beijing University of Posts and Telecommunications, China)
Zirui Zhuang (Beijing University of Posts and Telecommunications, China)
Haifeng Sun (Beijing University of Posts and Telecommunications, China)
Jun Huang (Meituan Corporation, China)
Jianxin Liao (Beijing University of Posts and Telecommunications, China)
FLUK: Protecting Federated Learning against Malicious Clients for Internet of Vehicles (abstract)
Haoran Dang (SKLP, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, China)
Meng Wu (SKLP, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, China)
Mingyu Yan (SKLP, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, China)
Xiaochun Ye (SKLP, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, China)
Dongrui Fan (SKLP, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, China)
GDL-GNN: Applying GPU Dataloading of Large Datasets for Graph Neural Network Inference (abstract)
PRESENTER: Haoran Dang
Weigang Zhang (Institute of Information Engineering. School of Cyber Security, University of Chinese Academy of Sciences, China)
Biyu Zhou (Institute of Information Engineering. School of Cyber Security, University of Chinese Academy of Sciences, China)
Xing Wu (Institute of Information Engineering. School of Cyber Security, University of Chinese Academy of Sciences, China)
Chaochen Gao (Institute of Information Engineering. School of Cyber Security, University of Chinese Academy of Sciences, China)
Zhibing Liu (Institute of Information Engineering. School of Cyber Security, University of Chinese Academy of Sciences, China)
Xuehai Tang (Institute of Information Engineering. School of Cyber Security, University of Chinese Academy of Sciences, China)
Ruixuan Li (Institute of Information Engineering. School of Cyber Security, University of Chinese Academy of Sciences, China)
Jizhong Han (Institute of Information Engineering. School of Cyber Security, University of Chinese Academy of Sciences, China)
Songlin Hu (Institute of Information Engineering. School of Cyber Security, University of Chinese Academy of Sciences, China)
Quartet: A Holistic Hybrid Parallel Framework for Training Large Language Models (abstract)
PRESENTER: Weigang Zhang
Héctor Martínez (Universidad de Córdoba, Spain)
Francisco D. Igual (Universidad Complutense de Madrid, Spain)
Rafael Rodríguez-Sánchez (Universidad de Castilla-La Mancha, Spain)
Sandra Catalan (Universitat Jaume I, Spain)
Adrián Castelló (Universitat Politècnica de València, Spain)
Enrique S. Quintana-Orti (Universitat Politècnica de València, Spain)
Inference with Transformer Encoders on ARM and RISC-V Multicore Processors (abstract)
10:30-12:30 Session 6C: Multidisciplinary, Domain-Specific and Applied Parallel and Distributed Computing (II)
Location: Auditorium
Júnior Löff (Università della Svizzera italiana (USI), Switzerland)
Dalvan Griebler (Pontifical Catholic University of Rio Grande do Sul (PUCRS), Brazil)
Luiz Gustavo Fernandes (Pontifical Catholic University of Rio Grande do Sul (PUCRS), Brazil)
Walter Binder (Università della Svizzera italiana (USI), Switzerland)
MPR: An MPI Framework for Distributed Self-Adaptive Stream Processing (abstract)
PRESENTER: Júnior Löff
Dian-Lun Lin (University of Wisconsin at Madison, United States)
Joshua San Miguel (University of Wisconsin at Madison, United States)
Umit Ogras (University of Wisconsin at Madison, United States)
Tsung-Wei Huang (University of Wisconsin at Madison, United States)
TaroRTL: Accelerating RTL Simulation using Coroutine-based Heterogeneous Task Graph Scheduling (abstract)
PRESENTER: Tsung-Wei Huang
Thiago Maltempi (Universidade Estadual de Campinas, Brazil)
Sandro Rigo (Universidade Estadual de Campinas, Brazil)
Marcio Pereira (Universidade Estadual de Campinas, Brazil)
Hervé Yviquel (Universidade Estadual de Campinas, Brazil)
Jessé Costa (Pará Federal University, Brazil)
Guido Araujo (Universidade Estadual de Campinas, Brazil)
Combining Compression and Prefetching to Improve Checkpointing for Inverse Seismic Problems in GPUs (abstract)
PRESENTER: Thiago Maltempi
Andoni Salcedo Navarro (Universitat de València, Spain)
Raúl Peña Ortiz (Universitat de València, Spain)
José M. Claver (Universitat de València, Spain)
Miguel Garcia Pineda (Universitat de València, Spain)
Juan Gutiérrez-Aguado (Universitat de València, Spain)
Cloud-native GPU-enabled architecture for parallel video encoding (abstract)
Xiaokang Fan (National University of Defense Technology, China)
Zhen Ge (National University of Defense Technology, China)
Sifan Long (Central South Univertisy, China)
Tao Tang (National University of Defense Technology, China)
Chun Huang (National University of Defense Technology, China)
Lin Peng (National University of Defense Technology, China)
Canqun Yang (National University of Defense Technology, China)
VLASPH: Smoothed Particle Hydrodynamics on VLA SIMD Architectures (abstract)
PRESENTER: Xiaokang Fan
10:30-12:30 Session 6D: Scheduling, Resource Management, Cloud, Edge Computing, and Workflows (I)
Location: -1.A.05
Louis-Claude Canon (FEMTO-ST, Université de Franche-Comté, France)
Anthony Dugois (FEMTO-ST, Université de Franche-Comté, France)
Loris Marchal (LIP, ENS Lyon, CNRS, France)
Solving the Restricted Assignment Problem to Schedule Multi-Get Requests in Key-Value Stores (Artifact) (abstract)
PRESENTER: Anthony Dugois
Sixing Yu (Iowa State University, United States)
Pablo Munoz (Intel Labs, United States)
Ali Jannesari (University of California, Berkeley, United States)
Resource-Aware Heterogeneous Federated Learning with Specialized Local Models (abstract)
Vincent Fagnon (LCOMS, University of Lorraine, France)
Giorgio Lucarelli (LCOMS, University of Lorraine, France)
Christophe Rapine (LCOMS, University of Lorraine, France)
Makespan Minimization for Scheduling on Heterogeneous Platforms with Precedence Constraints (abstract)
Zhengda Wu (Academy of Military Science, China)
Yixiao Feng (Academy of Military Science, China)
Mingtai Lv (Academy of Military Science, China)
Sining Yang (Academy of Military Science, China)
Bo Zhang (Academy of Military Science, China)
Deadline-driven Enhancements and Response Time Analysis of ROS2 Multi-threaded Executors (abstract)
Danilo Carastan-Santos (Université Grenoble Alpes, France)
Georges Da Costa (Université de Toulouse, CNRS, IRIT, France)
Millian Poquet (Université de Toulouse, CNRS, IRIT, France)
Patricia Stolf (Université de Toulouse, CNRS, IRIT, France)
Denis Trystram (Université Grenoble Alpes, France)
Light-weight prediction for improving energy consumption in HPC platforms (Artifact) (abstract)
12:30-13:30Lunch Break
13:30-15:30 Session 7A: Scheduling, Resource Management, Cloud, Edge Computing, and Workflows (II)
Location: Auditorium
Jiazhi Jiang (Sun Yat-sen University, China)
Hongbin Zhang (Sun Yat-sen University, China)
Deyin Liu (Sun Yat-sen University, China)
Jiangsu Du (Sun Yat-sen University, China)
Xiaojiao Yao (Sun Yat-sen University, China)
Jinhui Wei (Sun Yat-sen University, China)
Pin Chen (Sun Yat-sen University, China)
Dan Huang (Sun Yat-sen University, China)
Yutong Lu (Sun Yat-sen University, China)
Efficient Coupling Streaming AI and Ensemble Simulations on HPC Clusters (abstract)
Filip Mikina (University of Warsaw, Poland)
Paweł Żuk (University of Southern California, United States)
Krzysztof Rzadca (University of Warsaw, Poland)
sAirflow: Adopting Serverless in a Legacy Workflow Scheduler (abstract)
Farah Ait Salaht (Léonard de Vinci Pôle Universitaire, Research Center, France)
Nora Izri (Léonard de Vinci Pôle Universitaire, Research Center, France)
Maher Rebai (Léonard de Vinci Pôle Universitaire, Research Center, France)
Optimizing Service Replication and Placement for IoT Applications in Fog Computing Systems (abstract)
PRESENTER: Farah Ait Salaht
Alexis Bandet (Inria, France)
Francieli Boito (Inria, France)
Guillaume Pallez (Inria, France)
Scheduling distributed I/O resources in HPC systems (abstract)
PRESENTER: Alexis Bandet
Qian Yang (National University of Defense Technology, China)
Xuyan Jiang (National University of Defense Technology, China)
Wei Quan (National University of Defense Technology, China)
Rulin Liu (National University of Defense Technology, China)
Zhigang Sun (National University of Defense Technology, China)
Node Bundle Scheduling: An Ultra-Low Latency Traffic Scheduling Algorithm for TAS-based Time-Sensitive Networks (abstract)
13:30-15:30 Session 7B: Architectures and Accelerators (III)
Location: -1.A.02
Steef Hegeman (Leiden Institute of Advanced Computer Science, Leiden University, Netherlands)
Daan Wöltgens (Eindhoven University of Technology and TNO, Netherlands)
Anton Wijs (Eindhoven University of Technology, Netherlands)
Alfons Laarman (Leiden Institute of Advanced Computer Science, Leiden University, Netherlands)
Compact Parallel Hash Tables on the GPU (Artifact) (abstract)
Gabriel Gomez-Lopez (Universidad de Castilla-La Mancha, Spain)
Miguel Sánchez de la Rosa (Universidad de Castilla~La Mancha, Spain)
Jesus Escudero-Sahuquillo (University of Castilla-La Mancha, Spain)
Pedro Javier Garcia (Universidad de Castilla-La Mancha, Spain)
Francisco J. Quiles (Universidad de Castilla-La Mancha, Spain)
Pierre-Axel Lagadec (Atos, France)
Hybrid Congestion Control for BXI-based Interconnection Networks (abstract)
Stepan Nassyr (Forschungszentrum Juelich / JSC, Germany)
Dirk Pleiter (KTH Royal Institute of Technology, Sweden)
Exploring processor micro-architectures optimised for BLAS3 micro-kernels (Artifact) (abstract)
PRESENTER: Stepan Nassyr
Xuan Zhang (Shanghai Jiao Tong University, China)
Zhuoran Song (Shanghai Jiao Tong University, China)
Fangxin Liu (Shanghai Jiao Tong University, China)
Zhezhi He (Shanghai Jiao Tong University, China)
Li Jiang (Shanghai Jiao Tong University, China)
Xiaoyao Liang (Shanghai Jiao Tong University, China)
Watt: A Write-optimized RRAM-based Accelerator for Attention (abstract)
13:30-15:30 Session 7C: Theory and Algorithms (II)
Location: -1.A.03
Ivo Gabe de Wolff (Utrecht University, Netherlands)
Daniel Anderson (Carnegie Mellon University, United States)
Gabriele K. Keller (Utrecht University, Netherlands)
Aleksei Seletskiy (Carnegie Mellon University, United States)
A Fast Wait-Free Solution to Read-Reclaim Races in Reference Counting (Artifact) (abstract)
Qasim Abbas (Queen's University Belfast, UK)
Mohsen Koohi Esfahani (Queen's University Belfast, UK)
Ian Overton (Queen's University Belfast, UK)
Hans Vandierendonck (Queen's University Belfast, UK)
QClique: Optimizing Performance and Accuracy in Maximum Weighted Clique (abstract)
PRESENTER: Qasim Abbas
Sharon Boddu (Texas A&M University-Kingsville, United States)
Maleq Khan (Texas A&M University-Kingsville, United States)
ALZI: An Improved Parallel Algorithm for Finding Connected Components in Large Graphs (abstract)
Matthieu Robeyns (Université Paris-Saclay, CNRS, LISN, France)
Marc Baboulin (Universite Paris-Saclay, CNRS, LISN, France)
Simplice Donfack (Universite Paris-Saclay, CNRS, LISN, France)
Oguz Kaya (Universite Paris-Saclay, CNRS, LISN, France)
Theo Mary (Sorbonne Université, CNRS, LIP6, France)
Mixed precision randomized low-rank approximation with GPU tensor cores (abstract)
PRESENTER: Matthieu Robeyns
Filippo Ziche (University of Verona, Italy)
Federico Busato (University of Verona, Italy)
Rosalba Giugno (University of Verona, Italy)
Nicola Bombieri (University of Verona, Italy)
GPU-Accelerated BFS for Dynamic Networks (abstract)
PRESENTER: Filippo Ziche
13:30-15:30 Session 7D: Programming, Compilers and Performance (II)
Location: -1.A.04
Greg Henry (Intel Corporation, United States)
Eric Petit (Intel Corporation, United States)
Alexander Lyashevsky (Intel Corporation, United States)
Peter Caday (Intel Corporation, United States)
Deconstructing HPL-MxP benchmark: a numerical perspective (abstract)
Le Chen (Iowa State University, United States)
Arijit Bhattacharjee (Iowa State University, United States)
Nesreen Ahmed (Intel Labs, United States)
Niranjan Hasabnis (Intel Labs, United States)
Gal Oren (Technion - Israel Institute of Technology, Israel)
Vy Vo (Intel Labs, United States)
Ali Jannesari (Iowa State University, United States)
OMPGPT: A Generative Pre-trained Transformer Model for OpenMP (abstract)
Bizhao Shi (Peking University, China)
Tuo Dai (Peking University, China)
Sunan Zou (Peking University, China)
Xinming Wei (Peking University, China)
Guojie Luo (Peking University, China)
ImageMap: Enabling Efficient Mapping from Image Processing DSL to CGRA (abstract)
Lucas Van Lanker (CEA, France)
Hugo Taboada (CEA, France)
Elisabeth Brunet (Télécom SudParis, Institut Polytechnique de Paris, Inria, France)
François Trahay (Télécom SudParis, Institut Polytechnique de Paris, Inria, France)
Predicting GPU kernel's performance on upcoming architectures (abstract)
PRESENTER: Lucas Van Lanker
Bengisu Elis (Technical University of Munich, Germany)
David Boehme (Lawrence Livermore National Laboratory, United States)
Olga Pearce (Lawrence Livermore National Laboratory, United States)
Martin Schulz (Technical University of Munich, Germany)
A Mechanism to Generate Interception Based Tools for HPC Libraries (abstract)
PRESENTER: Bengisu Elis
15:30-16:00Coffee Break
16:00-17:30 Session 8A: Data analytics, AI, and Computational Science (III)
Location: -1.A.06
Kohei Hiraga (University of Tsukuba, Japan)
Osamu Tatebe (University of Tsukuba, Japan)
PEANUTS: A Persistent Memory-Based Network Unilateral Transfer System for Enhanced MPI-IO Data Transfer (Artifact) (abstract)
PRESENTER: Kohei Hiraga
Lin Wang (Huazhong University of Science and Technology, China)
Yuchong Hu (Huazhong University of Science and Technology, China)
Yuxue Liu (Huazhong University of Science and Technology, China)
Renzhi Xiao (Huazhong University of Science and Technology, China)
Dan Feng (Huazhong University of Science and Technology, China)
Asymmetric Coded Distributed Computation for Resilient Prediction Serving Systems (abstract)
Yunkun Liao (SKLP, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences; Zhongguancun Laboratory;, China)
Hanyue Lin (SKLP, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences, China)
Jingya Wu (SKLP, Institute of Computing Technology, CAS, China)
Wenyan Lu (SKLP, Institute of Computing Technology, CAS; YUSUR Tech Co., Ltd, China)
Huawei Li (SKLP, Institute of Computing Technology, CAS, China)
Xiaowei Li (SKLP, Institute of Computing Technology, CAS; Zhongguancun Laboratory, China)
Guihai Yan (SKLP, Institute of Computing Technology, CAS; YUSUR Tech Co., Ltd, China)
Athena: Add More Intelligence to RMT-based Network Data Plane with Low-bit Quantization (abstract)
PRESENTER: Yunkun Liao
Zhi Lu (Huazhong University of Science and Technology, China)
Songfeng Lu (Huazhong University of Science and Technology, China)
Yongquan Cui (Huazhong University of Science and Technology, China)
Junjun Wu (Huazhong University of Science and Technology, China)
Hewang Nie (Huazhong University of Science and Technology, China)
Jue Xiao (Huazhong University of Science and Technology, China)
Zepu Yi (Huazhong University of Science and Technology, China)
Lightweight Byzantine-Robust and Privacy-Preserving Federated Learning (abstract)
Jiguang Lv (College of Computer Science and Technology, Harbin Engineering University, China)
Shuchun Xu (College of Computer Science and Technology, Harbin Engineering University, China)
Xiaodong Zhan (Changan Communication Technology Co., LTD, China)
Tao Liu (College of Computer Science and Technology, Harbin Engineering University, China)
Dapeng Man (College of Computer Science and Technology, Harbin Engineering University, China)
Wu Yang (College of Computer Science and Technology, Harbin Engineering University, China)
FedGG: Leveraging Generative Adversarial Networks and Gradient Smoothing for Privacy Protection in Federated Learning (abstract)
16:00-17:30 Session 8B: Multidisciplinary, Domain-Specific and Applied Parallel and Distributed Computing (III)
Location: -1.A.04
Yuang Chen (The Chinese University of Hong Kong, Hong Kong)
Jeffery Xu Yu (The Chinese University of Hong Kong, Hong Kong)
Vectorizing Sparse Blocks of Graph Matrices for SpMV (abstract)
L. Felipe Romero (University of Almeria, Spain)
Marcos Lupión Lorente (University of Almería, Spain)
N. C. Cruz (University of Almeria, Spain)
Luis F. Romero (Universidad de Málaga, Spain)
Pilar M. Ortigosa (University of Almeria, Spain)
On the use of hybrid computing for accelerating EEG preprocessing (abstract)
PRESENTER: L. Felipe Romero
Jie Jia (Beihang University, China)
Yi Liu (Beihang University, China)
Yifan Chen (Beihang University, China)
Yanke Liu (Beihang University, China)
Fang Lin (Beihang University, China)
AdapCK: Optimizing I/O for Checkpointing on Large-scale High Performance Computing Systems (abstract)
Dazheng Liu (Hunan University, China)
Xiaoli Ren (National University of Defense Technology, China)
Jianping Wu (National University of Defense Technology, China)
Wenjuan Liu (Hunan University, China)
Juan Zhao (National University of Defense Technology, China)
Shaoliang Peng (Hunan University, China)
Pipe-AGCM: A Fine-grain Pipelining Scheme for Optimizing the Parallel Atmospheric General Circulation Model (abstract)
16:00-17:30 Session 8C: Scheduling, Resource Management, Cloud, Edge Computing, and Workflows (III)
Location: -1.A.05
Handong Luo (Fudan University, China)
Wenhao Liu (Fudan University, China)
Qi Zhang (Fudan University, China)
Ziheng Yang (Fudan University, China)
Quanwei Lin (Fudan University, China)
Wenjun Zhu (Fudan University, China)
Kun Qiu (Fudan University, China)
Zhe Chen (Fudan University, China)
Yue Gao (Fudan University, China)
Hurry: Dynamic Collaborative Framework For Low-orbit Mega-Constellation Data Downloading (abstract)
Yibing Lin (Tongji University, China)
Binbin Feng (Tongji University, China)
Zhijun Ding (Tongji University, China)
Context-aware Runtime Type Prediction for Heterogeneous Microservices (abstract)
Yuandou Wang (Multiscale Networked Systems, University of Amsterdam, Netherlands)
Neel Kanwal (Department of Electrical Engineering and Computer Science, University of Stavanger, Norway)
Kjersti Engan (Department of Electrical Engineering and Computer Science, University of Stavanger, Norway)
Chunming Rong (Department of Electrical Engineering and Computer Science, University of Stavanger, Norway)
Paola Grosso (Multiscale Networked Systems, University of Amsterdam, Netherlands)
Zhiming Zhao (Multiscale Networked Systems, University of Amsterdam, Netherlands)
PriCE: Privacy-Preserving and Cost-Effective Scheduling for Parallelizing the Large Medical Image Processing Workflow over Hybrid Clouds (abstract)
PRESENTER: Yuandou Wang
Yiming Yao (Peking University, China)
Yingwei Luo (Peking University, China)
Xiaolin Wang (Peking University, China)
Zhenlin Wang (Michigan Technological University, United States)
Liujia Li (Peking University, China)
Jianyu Wu (Peking University, China)
Liren Zhu (Peking University, China)
EKRM: Efficient Key-Value Retrieval Method to Reduce Data Lookup Overhead for Redis (abstract)
18:45-20:00 Walking Tour

Did you know that Madrid was a relatively small town, practically unknown outside of Spain before the year 1561?  The city’s fortunes changed that year when it burst upon the scene of European politics by becoming the permanent capital of Spain.  The dynasty at the head of this change was known as the Habsburgs, a family that ruled the country and much of the known world from the 16th to the 18th century, and who were referred to in Spain as the House of Austria.

The historic center of Madrid was built up predominantly during the reign of that same dynasty, and this fascinating walking tour of Madrid de los Austrias, takes you through that area, giving you the best introduction to the Spanish capital.

Friday, August 30th

View this program: with abstractssession overviewtalk overview

09:00-10:00 Session 9: Keynote: İlkay Altıntaş

Bridging the Data Gaps to Democratize AI in Science, Education and Society

The democratization of Artificial Intelligence (AI) necessitates an ecosystem where data and research infrastructure are seamlessly integrated and universally accessible. This talk overviews the imperative of bridging the gaps between these components through robust services, facilitating an inclusive AI landscape that empowers diverse research communities and domains. The National Data Platform (NDP) aims to lower the barriers to entry for AI research and applications through an integrated services approach to streamline AI workflows, from data acquisition to model deployment. This approach underscores the importance of open, extensible, and equitable systems in driving forward the capabilities of AI, ultimately contributing to the resolution of grand scientific and societal challenges. Through examining real case studies leveraging open data platforms and scalable research infrastructure, the talk will highlight the role of composable systems and services in NDP to catalyze a platform to empower users from all backgrounds to engage in meaningful research, learning, and discovery. 

Location: Auditorium
10:00-10:30Coffee Break
12:30-13:30Lunch Break
13:30-14:50 Session 11A: Programming, Compilers and Performance (I)
Location: -1.A.03
Suren Harutyunyan Gevorgyan (Universitat Autònoma de Barcelona, Spain)
Anna Sikora (Universitat Autònoma de Barcelona, Spain)
Eduardo Cesar (Universitat Autònoma de Barcelona, Spain)
Jiří Filipovič (Institute of Computer Science, Masaryk University, Czechia)
Akash Dutta (Iowa State University, United States)
Ali Jannesari (Iowa State University, United States)
Jordi Alcaraz (University of Oregon, United States)
Efficient Code Region Characterization through Automatic Performance Counters Reduction using Machine Learning Techniques (abstract)
Anju Mongandampulath Akathoott (IIT Madras, India)
Rupesh Nasre. (IIT Madras, India)
FlexiGran: Flexible Granularity Locking in Hierarchies (abstract)
Mohammad Zubair (Old Dominion University, United States)
Christoph Bauinger (Intel Corporation, Austria)
ESIMD GPU implementations of Deep Learning Sparse Matrix Kernels (abstract)
13:30-14:50 Session 11B: Multidisciplinary, Domain-Specific and Applied Parallel and Distributed Computing (IV)
Location: Auditorium
Xianlong Zhou (, China)
Pei Li (, China)
Jiageng Chen (, China)
Shixiong Yao (, China)
Accelerating Stencil Computation with Fully Homomorphic Encryption Using GPU (abstract)
Helena Schubert da Incarnacao Lima da Silva (University of Brasilia, Brazil)
Maria Clicia Stelling de Castro (State University of Rio de Janeiro, Brazil)
Fabricio Alves Barbosa da Silva (Oswaldo Cruz Foundation, Brazil)
Alba Cristina Magalhaes Alves de Melo (University of Brasilia, Brazil)
A Framework for Automated Parallel Execution of Scientific Multi-Workflow Applications in the Cloud with Work Stealing (abstract)
Guofeng Feng (State Key Lab of Processor, Institute of Computing Technology, Chinese Academy of Sciences, China)
Hongyu Wang (State Key Lab of Processor, Institute of Computing Technology, Chinese Academy of Sciences, China)
Zhuoqiang Guo (State Key Lab of Processor, Institute of Computing Technology, Chinese Academy of Sciences, China)
Mingzhen Li (State Key Lab of Processor, Institute of Computing Technology, Chinese Academy of Sciences, China)
Tong Zhao (State Key Lab of Processor, Institute of Computing Technology, Chinese Academy of Sciences, China)
Zhou Jin (Super Scientific Software Laboratory, China University of Petroleum-Beijing, China)
Weile Jia (State Key Lab of Processor, Institute of Computing Technology, Chinese Academy of Sciences, China)
Guangming Tan (State Key Lab of Processor, Institute of Computing Technology, Chinese Academy of Sciences, China)
Ninghui Sun (State Key Lab of Processor, Institute of Computing Technology, Chinese Academy of Sciences, China)
Accelerating Large-Scale Sparse LU Factorization for RF Circuit Simulation (abstract)
PRESENTER: Guofeng Feng
13:30-14:50 Session 11C: Scheduling, Resource Management, Cloud, Edge Computing, and Workflows (IV)
Location: -1.A.05
Zechun Zhou (University of Science and Technology of China, China)
Jingwei Sun (University of Science and Technology of China, China)
Hengquan Mei (University of Science and Technology of China, China)
Peng Sun (University of Science and Technology of China, China)
Guangzhong Sun (University of Science and Technology of China, China)
DProbe: Profiling and Predicting Multi-Tenant Deep Learning Workloads for GPU Resource Scaling (abstract)
PRESENTER: Zechun Zhou
Haibo Tang (East China Normal University, China)
Huan Zhang (East China Normal University, China)
Zhenyu Zhang (East China Normal University, China)
Zhao Zhang (East China Normal University, China)
Cheqing Jin (East China Normal University, China)
Aoying Zhou (East China Normal University, China)
Towards High-Performance Transactions via Hierarchical Blockchain Sharding (abstract)
Tingkai Liu (University of Illinois Urbana-Champaign, United States)
Huili Tao (University of Illinois Urbana-Champiagn, United States)
Yicheng Lu (University of Illinois Urbana-Champiagn, United States)
Zhongbo Zhu (University of Illinois Urbana-Champaign, United States)
Marquita Ellis (IBM Research, United States)
Sara Kokkila-Schumacher (IBM Research, United States)
Volodymyr Kindratenko (University of Illinois Urbana-Champaign, United States)
Automated Data Management and Learning-based Scheduling for Ray-based Hybrid HPC-Cloud Systems (abstract)
PRESENTER: Tingkai Liu
13:30-14:50 Session 11D: Architectures and Accelerators (IV)
Location: -1.A.04
Mohammad Hafezan (Lakehead University, Canada)
Reza Jahadi (Lakehead University, Canada)
Ehsan Atoofian (Lakehead University, Canada)
PCTC: Hardware and Software Co-Design for Pruned Capsule Networks on Tensor Cores (abstract)
PRESENTER: Ehsan Atoofian
Chuhui Wang (Zhejiang University, China)
Zewen Ye (Zhejiang University, China)
Haibin Shen (Zhejiang University, China)
Kejie Huang (Zhejiang University, China)
A Folded Computation-in-Memory Accelerator for Fast Polynomial Multiplication in BIKE (abstract)
Leandro Fiorin (Dipartimento di Elettronica e Informazione, Politecnico di Milano, Italy)
Cristina Silvano (Dipartimento di Elettronica e Informazione, Politecnico di Milano, Italy)
MEPAD: A Memory-efficient Parallelized Direct Convolution Algorithm for Deep Neural Networks (abstract)