EURO-PAR 2025: 31ST INTERNATIONAL EUROPEAN CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING
PROGRAM

Days: Monday, August 25th Tuesday, August 26th Wednesday, August 27th Thursday, August 28th Friday, August 29th

Monday, August 25th

View this program: with abstractssession overviewtalk overview

10:30-11:00Coffee Break
12:30-14:00Lunch Break
15:30-16:00Coffee Break
Tuesday, August 26th

View this program: with abstractssession overviewtalk overview

10:30-11:00Coffee Break
12:30-14:00Lunch Break
15:30-16:00Coffee Break
Wednesday, August 27th

View this program: with abstractssession overviewtalk overview

10:30-11:00Coffee Break
11:00-12:30 Session 11A: Track 2.1: Scheduling, Resource Management, Cloud, Edge Computing, and Workflows
11:00
Daniel Medeiros (KTH Royal Institute of Technology, Sweden)
Jeremy Williams (KTH Royal Institute of Technology, Sweden)
Jacob Wahlgren (KTH Royal Institute of Technology, Sweden)
Leonardo Saud Maia Leite (KTH Royal Institute of Technology, Sweden)
Ivy Peng (KTH Royal Institute of Technology, Sweden)
ARC-V: Vertical Resource Adaptivity for HPC Workloads in Containerized Environments (abstract)
11:20
Thomas Jakobsche (University of Basel, Switzerland)
Osman S. Simsek (University of Basel, Switzerland)
Jim Brandt (Sandia National Laboratories, United States)
Ann Gentile (Sandia National Laboratories, United States)
Florina M. Ciorba (University of Basel, Switzerland)
An Autonomy Loop for Dynamic HPC Job Time Limit Adjustment (abstract)
11:40
Rajat Bhattarai (Tennessee Tech University, United States)
Howard Pritchard (Los Alamos National Laboratory, United States)
Sheikh Ghafoor (Tennessee Tech University, United States)
Enabling Elasticity in Scientific Workflows for High Performance Computing Systems (abstract)
12:00
Marta Navarro (Universitat Politècnica de València, Spain)
Vicent Pallardó-Julià (Universitat de València, Spain)
Salvador Petit (Universitat Politècnica de València, Spain)
Maria Gomez (Universitat Politècnica de València, Spain)
Julio Sahuquillo (Universitat Politècnica de València, Spain)
WAPA: A Workload-Agnostic CPI-Based Thread-to-Core Allocation Policy (abstract)
11:00-12:30 Session 11B: Track 3.1: Neural Network Acceleration and Optimization
11:00
Yudong Mu (Institute of Computing Technology, Chinese Academy of Sciences, China)
Zhihua Fan (Institute of Computing Technology, Chinese Academy of Sciences, China)
Xiaoxia Yao (China Mobile Research Institute, China)
Wenming Li (Institute of Computing Technology, Chinese Academy of Sciences, China)
Zhiyuan Zhang (Insitute of Computing Technology, Chinese Academy of Sciences, China)
Honglie Wang (Institute of Automation, Chinese Academy of Sciences, China)
Xuejun An (Insitute of Computing Technology, Chinese Academy of Sciences, China)
Xiaochun Ye (Institute of Computing Technology, Chinese Academy of Sciences, China)
FDHA: Fusion-Driven Heterogeneous Accelerator for Efficient Diffusion Model Inference (abstract)
11:20
Jiale Dong (University of Science and Technology of China, China)
Hao Wu (University of Science and Technology of China, China)
Zihao Wang (University of Science and Technology of China, China)
Wenqi Lou (University of Science and Technology of China, China)
Zhendong Zheng (University of Science and Technology of China, China)
Lei Gong (University of Science and Technology of China, China)
Chao Wang (University of Science and Technology of China, China)
Xuehai Zhou (University of Science and Technology of China, China)
CoQMoE: Co-Designed Quantization and Computation Orchestration for Mixture-of-Experts Vision Transformer on FPGA (abstract)
11:40
Joonyup Kwon (Korea University, South Korea)
Jinhyeok Choi (Korea University, South Korea)
Ngoc-Son Pham (Korea University, South Korea)
Sangwon Shin (Korea University, South Korea)
Taeweon Suh (Korea University, South Korea)
SkipNZ: Non-Zero Value Skipping for Efficient CNN Acceleration (abstract)
12:00
Piyumal Ranawaka (Chalmers University of Technology, Sweden)
Per Stenstrom (Chalmers University of Technology, Sweden)
BATCH-DNN: Adaptive and Dynamic Batching for Multi-DNN Accelerators (abstract)
11:00-12:30 Session 11C: Track 6.1: Memory and I/O Systems
11:00
Yisu Wang (HKUST (GZ), China)
Xinjiao Li (HKUST (GZ), China)
Ruilong Wu (HKUST (GZ), China)
Huangxun Chen (HKUST (GZ), China)
Dirk Kutscher (HKUST (GZ), Germany)
NetSenseML: Network-Adaptive Compression for Efficient Distributed Machine Learning (abstract)
11:20
John W. Romein (Stichting ASTRON (Netherlands Institute for Radio Astronomy), Netherlands)
Breaking the I/O Barrier: 1.2 Tb/s Ethernet Packet Processing on a GPU (abstract)
11:40
Tianyu Wan (Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, Hubei, China, China)
Shijia Gong (Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, Hubei, China, China)
Yangyang Hu (Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, Hubei, China, China)
Jianxi Chen (Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, Hubei, China, China)
GECKO: A Write-optimized Hybrid Index based on Disaggregated Memory (abstract)
12:00
Jhonatan Cléto (Universidade Estadual de Campinas (UNICAMP), Brazil)
Guilherme Valarini (Universidade Estadual de Campinas (UNICAMP), Brazil)
Marcio Pereira (Universidade Estadual de Campinas (UNICAMP), Brazil)
Guido Araujo (Universidade Estadual de Campinas (UNICAMP), Brazil)
Hervé Yviquel (Universidade Estadual de Campinas (UNICAMP), Brazil)
Scalable OpenMP Remote Offloading via Asynchronous MPI and Coroutine-Driven Communication (abstract)
12:30-14:00Lunch Break
14:00-15:00 Session 12A: Track 1.1: Performance Analysis and Simulation
14:00
Anna-Lena Roth (Hochschule Fulda, University of Applied Sciences, Germany)
David James (Hochschule Fulda, University of Applied Sciences, Germany)
Michael Kuhn (Otto von Guericke University Magdeburg, Germany)
Dustin Frisch (Hochschule Fulda, University of Applied Sciences, Germany)
Making MPI Collective Operations Visible: Understanding Their Utility and Algorithmic Insights (abstract)
14:20
Jaewoo Son (Seoul National University, South Korea)
Youngchul Yoon (Seoul National University, South Korea)
Soonhoi Ha (Seoul National University, South Korea)
TSim4CXL: Trace-driven Simulation Framework for CXL-based High-Performance Computing Systems (abstract)
14:40
Solomon Bekele (Argonne National Laboratory, United States)
Aurelio Vivas (University De Los Andes - Colombia, Colombia)
Thomas Applencourt (Argonne National Laboratory, United States)
Kazutomo Yoshii (Argonne National Laboratory, United States)
Swann Perarnau (Argonne National Laboratory, United States)
Servesh Muralidharan (Argonne National Laboratory, United States)
Bryce Allen (Argonne National Laboratory, United States)
Brice Videau (Argonne National Laboratory, United States)
THAPI: Tracing Heterogeneous APIs (abstract)
14:00-15:00 Session 12B: Track 6.2: Learning systems
14:00
Xinrui Yang (Harbin Institute of Technology, Shenzhen, China)
Shaohuai Shi (Harbin Institute of Technology, Shenzhen, China)
SQ-DeAR: Sparsified and Quantized Gradient Compression for Distributed Training (abstract)
14:20
Samuel Wiggins (University of Southern California, United States)
Nikunj Gupta (University of Southern California, United States)
Grace Zgheib (Altera, United States)
Mahesh Iyer (Altera, United States)
Viktor Prasanna (University of Southern California, United States)
Accelerating Independent Multi-Agent Reinforcement Learning on Multi-GPU Platforms (abstract)
14:40
Wenxiang Lin (Harbin Institute of Technology, Shenzhen, China)
Xinglin Pan (The Hong Kong University of Science and Technology, GuangZhou, China)
Shaohuai Shi (Harbin Institute of Technology, Shenzhen, China)
Xuan Wang (Harbin Institute of Technology, Shenzhen, China)
Xiaowen Chu (The Hong Kong University of Science and Technology, Guangzhou, China)
ScheInfer: Efficient Inference of Large Language Models with Task Scheduling on Moderate GPUs. (abstract)
15:00-16:00Coffee Break and Poster Session
16:00-17:30 Session 13A: Track 2.2: Scheduling, Resource Management, Cloud, Edge Computing, and Workflows
16:00
Jianfeng Gu (Technical University of Munich, Germany)
Puxuan Wang (Technical University of Munich, Germany)
Isaac David Núñez Araya (Technical University of Munich, Germany)
Kai Huang (Sun Yat-sen University, China)
Michael Gerndt (Technical University of Munich, Germany)
HAS-GPU: Efficient Hybrid Auto-scaling with Fine-grained GPU Allocation for SLO-aware Serverless Inferences (abstract)
16:20
Yiming Sun (Institute of Computing Technology, Chinese Academy of Sciences, China)
Jiaqi Zhang (Institute of Computing Technology, Chinese Academy of Sciences, China)
Jie Zhang (Institute of Computing Technology, Chinese Academy of Sciences, China)
Huawei Cao (Institute of Computing Technology, Chinese Academy of Sciences, China)
Xuejun An (Institute of Computing Technology, Chinese Academy of Sciences, China)
Xiaochun Ye (Institute of Computing Technology, Chinese Academy of Sciences, China)
CGP-Graphless: Towards Efficient Serverless Graph Processing via CPU-GPU Pipelined Collaboration (abstract)
16:40
Kaicheng Guo (Shanghai Jiao Tong University, China)
Jingyi Chen (Shanghai Jiao Tong University, China)
Yun Wang (Shanghai Jiao Tong University, China)
Semakin Anton (Huawei technologies co. ltd., Russia)
Tovmachenko Dmitry (Huawei technologies co. ltd., Russia)
Jiajie Sheng (Shanghai Jiao Tong University, China)
Jianwen Wei (Shanghai Jiao Tong University, China)
James Lin (Shanghai Jiao Tong University, China)
Zhengwei Qi (Shanghai Jiao Tong University, China)
Haibing Guan (Shanghai Jiao Tong University, China)
Design and Operation of Elastic GPU-pooling on Campus (abstract)
17:00
Mingxuan Liu (Northwestern Polytechnical University, China)
Jianhua Gu (Northwestern Polytechnical University, China)
Tianhai Zhao (Northwestern Polytechnical University, China)
ServerlessRec: Fast Serverless Inference for Embedding-based Recommender Systems with Disaggregated Memory (abstract)
16:00-17:30 Session 13B: Track 6.3: Stream, Image and Sequence Processing
16:00
Apurv Deepak Kulkarni (Center for Scalable Data Analytics and Artificial Intelligence Dresden/Leipzig, Germany)
Siavash Ghiasvand (Center for Scalable Data Analytics and Artificial Intelligence Dresden/Leipzig, Germany)
SProBench: Stream Processing Benchmark for High Performance Computing Infrastructure (abstract)
16:20
Lifeng Yan (Shandong University, China)
Zekun Yin (Shandong University, China)
Qixin Chang (Shandong University, China)
Tong Zhang (Shandong University, China)
Zhisong Wang (Shandong University, China)
Xiaohui Duan (Shandong University, China)
Bertil Schmidt (Johannes Gutenberg University, Germany)
Weiguo Liu (Shandong University, China)
SWBWA: A Highly Efficient NGS Aligner on the New Sunway Architecture (abstract)
16:40
Marie Reinbigler (Télécom SudParis - Institut Polytechnique de Paris, France)
Rishi Sharma (EPFL, Switzerland)
Rafael Pires (EPFL, Switzerland)
Elisabeth Brunet (Télécom Sudparis - Institut Polytechnique de Paris, Inria, France)
Anne-Marie Kermarrec (EPFL, Switzerland)
Catalin Fetita (Télécom Sudparis - Institut Polytechnique de Paris, France)
Efficient Pyramidal Analysis of Gigapixel Images on a Decentralized Modest Computer Cluster (abstract)
Thursday, August 28th

View this program: with abstractssession overviewtalk overview

10:00-10:30Coffee Break
10:30-12:30 Session 15: Best Paper Session
10:30
Aurélien Delval (SiPearl - Université Paris-Saclay, UVSQ, Li-PaRAD, France)
Pablo de Oliveira Castro (Université Paris-Saclay, UVSQ, Li-PaRAD, France)
William Jalby (Université Paris-Saclay, UVSQ, Li-PaRAD, France)
Etienne Renault (SiPearl, France)
Noise injection for performance bottleneck analysis (abstract)
10:50
Louis-Claude Canon (Univ. Marie et Louis Pasteur, CNRS, institut FEMTO-ST, F-25000 Besançon, France)
Anthony Dugois (Univ. Marie et Louis Pasteur, CNRS, institut FEMTO-ST, F-25000 Besançon, France)
Ismaël Jecker (Univ. Marie et Louis Pasteur, CNRS, institut FEMTO-ST, F-25000 Besançon, France)
Pierre-Cyrille Heam (Univ. Marie et Louis Pasteur, CNRS, institut FEMTO-ST, F-25000 Besançon, France)
Approximation Bounds for SLACK on Identical Parallel Machines (abstract)
11:10
Jiangying Xue (University of Electronic Science and Technology of China, China)
Tianyu Xiong (University of Electronic Science and Technology of China, China)
Lingwei Chao (University of Electronic Science and Technology of China, China)
Ruini Xue (University of Electronic Science and Technology of China, China)
SimPoint+: More Stable, Accurate and Efficient Program Analysis (abstract)
11:30
Xuanzheng Wang (Tsinghua University, China)
Shuo Miao (Tsinghua University, China)
Zihan Zhu (Tsinghua University, China)
Peng Qu (Tsinghua University, China)
Youhui Zhang (Tsinghua University, China)
AlphaSparseTensor: Discovering Faster Sparse Matrix Multiplication Algorithms on GPUs for LLM Inference (abstract)
11:50
Jeffrey Spaan (University of Twente, Netherlands)
Kuan-Hsun Chen (University of Twente, Netherlands)
David A. Bader (NJIT, United States)
Ana-Lucia Varbanescu (University of Twente, Netherlands)
Wedge-Parallel Triangle Counting for GPUs (abstract)
12:10
Abhijeet Sahu (Indian Institute of Technology Tirupati, India)
Andaluri S P V M Aditya (Indian Institute of Technology Tirupati, India)
Gadhamsetty Ramakrishna (Indian Institute of Technology Tirupati, India)
Malleti Sai Nikhil (Indian Institute of Technology Tirupati, India)
Kishore Kothapalli (International Institute of Information Technology Hyderabad, India)
Dip Sankar Banerjee (Indian Institute of Technology Jodhpur, India)
External GPU Biconnected Components (abstract)
12:30-14:00Lunch Break
14:00-15:30 Session 16A: Track 1.2: Compilers, Optimizations, and Scheduling
14:00
Wei Li (Chongqing University, China)
Ao Ren (Chongqing University, China)
Qingqiu Lan (Chongqing University, China)
Haining Fang (Chongqing University, China)
Zhenyu Wang (Chongqing University, China)
Yujuan Tan (Chongqing University, China)
Kan Zhong (Chongqing University, China)
Duo Liu (Chongqing University, China)
CoSF: A Co-Optimization Framework for Operator Splitting and Fusion (abstract)
14:20
Jie Tong (University of Wisconsin-Madison, United States)
Wan-Luan Lee (University of Wisconsin–Madison, United States)
Umit Yusuf Ogras (University of Wisconsin-Madison, United States)
Tsung-Wei Huang (University of Wisconsin-Madison, United States)
Scalable Code Generation for RTL Simulation of Deep Learning Accelerators with MLIR (abstract)
14:40
Ivo Gabe de Wolff (Utrecht University, Netherlands)
David van Balen (Utrecht University, Netherlands)
Gabriele Keller (Utrecht University, Netherlands)
Scheduling Task and Data Parallelism in Array Languages with Work Assisting (abstract)
15:00
Andre Rauber Du Bois (Universidade Federal de Pelotas, Brazil)
Gerson Cavalheiro (Universidade Federal de Pelotas, Brazil)
Polymorphic Higher-Order GPU Kernels (abstract)
14:00-15:30 Session 16B: Track 4.1: Scalable AI Optimization and Parallel Training
14:00
Xinjue Zheng (Huazhong University of Science and Technology, China)
Zhangqiang Ming (Huazhong University of Science and Technology, China)
Yuchong Hu (Huazhong University of Science and Technology, China)
Chenxuan Yao (Huazhong University of Science and Technology, China)
Wenxiang Zhou (Huazhong University of Science and Technology, China)
Rui Wang (Huazhong University of Science and Technology, China)
Xun Chen (Huazhong University of Science and Technology, China)
Dan Feng (Huazhong University of Science and Technology, China)
Saving Memory via Residual Reduction for DNN Training with Compressed Communication (abstract)
14:20
Jacob Garby (Chalmers University of Technology, Sweden)
Philippas Tsigas (Chalmers University of Technology, Sweden)
Interval-Asynchrony: Delimited Intervals of Localised Asynchrony for Fast Parallel SGD (abstract)
14:40
Sanjif Shanmugavelu (Groq Inc, UK)
Mathieu Taillefumier (CSCS Swiss National Supercomputing Centre, Switzerland)
Christopher Culver (Groq Inc, United States)
Oscar Hernandez (Oak Ridge National Laboratory, United States)
Vijay Ganesh (Georgia Tech, United States)
Ada Sedova (Oak Ridge National Laboratory, United States)
Robustness of deep learning classification to adversarial input on GPUs: asynchronous parallel accumulation is a source of vulnerability (abstract)
15:00
Matyáš Brabec (Charles University, Czech Republic, Czechia)
Jiří Klepl (Charles University, Czech Republic, Czechia)
Michal Töpfer (Charles University, Czech Republic, Czechia)
Martin Kruliš (Charles University, Czech Republic, Czechia)
Tutoring LLM into a Better CUDA Optimizer (abstract)
14:00-15:30 Session 16C: Track 3.2: Architecture
14:00
Hao Lan (Institute of Computing Technology,Chinese Academy of Science;ZGC Laboratory;University of Chinese Academy of Sciences, China)
Ziang Zhou (Institute of Computing Technology,Chinese Academy of Science;University of Chinese Academy of Sciences,Beijing, China, China)
Qi Zhu (Institute of Computing Technology,Chinese Academy of Science;University of Chinese Academy of Sciences,Beijing, China, China)
Wei Yan (Institute of Computing Technology,Chinese Academy of Science;ZGC Laboratory;University of Chinese Academy of Sciences, China)
Qinfen Hao (Institute of Computing Technology,Chinese Academy of Science;ZGC Laboratory;University of Chinese Academy of Sciences, China)
Xiaochun Ye (Institute of Computing Technology,Chinese Academy of Science;University of Chinese Academy of Sciences,Beijing, China, China)
Yong Liu (ZGC Laboratory;Qi-AnXin Technology Group, QAX Security Center, Xicheng District, Beijing, China, China)
Ninghui Sun (Institute of Computing Technology,Chinese Academy of Science;ZGC Laboratory;University of Chinese Academy of Sciences, China)
ParTEE:A Framework for Secure Parallel Computing of RISC-V TEEs (abstract)
14:20
Ruimin Shi (KTH Royal Institute of Technology, Sweden)
Gabin Schieffer (KTH Royal Institute of Technology, Sweden)
Maya Gokhale (Lawrence Livermore National Laboratory, United States)
Pei-Hung Lin (Lawrence Livermore National Laboratory, United States)
Hiren Patel (University of Waterloo, Canada)
Ivy Peng (KTH Royal Institute of Technology, Sweden)
ARM SVE Unleashed: Performance and Insights Across HPC Applications on Nvidia Grace (abstract)
14:40
Jin Pu (Shanghai Jiao Tong University, China)
Shengan Zheng (Shanghai Jiao Tong University, China)
Penghao Sun (Shanghai Jiao Tong University, China)
Guifeng Wang (Shanghai Jiao Tong University, China)
Xin Xie (Shanghai Jiao Tong University, China)
Linpeng Huang (Shanghai Jiao Tong University, China)
CSGC: Collaborative File System Garbage Collection with Computational Storage (abstract)
15:00
Zhenxuan Xiong (National University of Defense Technology, China)
Libo Huang (National University of Defense Technology, China)
Ling Yang (National University of Defense Technology, China)
Hui Guo (National University of Defense Technology, China)
Junhui Wang (National University of Defense Technology, China)
Zheng Zhong (National University of Defense Technology, China)
Songwen Pei (University of Shanghai for Science and Technology, China)
Gang Chen (Sun Yat-sen University, China)
Yongwen Wang (National University of Defense Technology, China)
SONet: Towards Practical Online Neural Network for Enhancing Hard-To-Predict Branches (abstract)
15:30-16:00Coffee Break
16:00-17:30 Session 17A: Track 3.3: Caching and Memory for ML
16:00
Mengyue Xi (Sun Yat-sen University, China)
Jingyi He (Sun Yat-sen University, China)
Xianwei Zhang (Sun Yat-sen University, China)
CacheC: LLM-based GPU Cache Management to Enhance Kernel Concurrency (abstract)
16:20
Zhaoyang Zeng (Chongqing University, China)
Yujuan Tan (Chongqing University, China)
Jiali Li (Tsinghua University, China)
Zhuoxin Bai (Chongqing University, China)
Kan Zhong (Chongqing University, China)
Duo Liu (Chongqing University, China)
Ao Ren (Chongqing University, China)
Cocache: An Accurate And Low-overhead Dynamic Caching Method for GNNs (abstract)
16:40
Yi Luo (Southwest University of Science and Technology, China)
Yaobin Wang (Southwest University of Science and Technology, China)
Qi Wang (Southwest University of Science and Technology, China)
Yingchen Song (Southwest University of Science and Technology, China)
Huan Wu (Southwest University of Science and Technology, China)
Qingfeng Wang (Southwest University of Science and Technology, China)
Jun Huang (Southwest University of Science and Technology, China)
DCI: An Efficient Workload-Aware Dual-Cache Allocation GNN Inference Acceleration System (abstract)
17:00
Kazi Asifuzzaman (Oak Ridge National Laboratory, United States)
Aaron Young (Oak Ridge National Laboratory, United States)
Prasanna Date (Oak Ridge National Laboratory, United States)
Shruti Kulkarni (Oak Ridge National Laboratory, United States)
Narasinga Rao Miniskar (Oak Ridge National Laboratory, United States)
Matthew Marinella (Arizona State University, United States)
Jeffrey Vetter (Oak Ridge National Laboratory, United States)
ReSpike: A Co-Design Framework for Evaluating SNNs on ReRAM-based Neuromorphic Processors (abstract)
16:00-17:30 Session 17B: Track 2.3: Scheduling, Resource Management, Cloud, Edge Computing, and Workflows
16:00
Yang Xu (University of Science and Technology of China, China)
Zhiwei Yao (University of Science and Technology of China, China)
Hongli Xu (University of Science and Technology of China, China)
Yunming Liao (University of Science and Technology of China, China)
Zuan Xie (University of Science and Technology of China, China)
MPLS: Stacking Diverse Layers into One Model for Decentralized Federated Learning (abstract)
16:20
Roopkatha Banerjee (Indian Institute of Science, India)
Tejus Chandrashekar (Indian Institute of Science, India)
Ananth Eswar (Indian Institute of Science, India)
Yogesh Simmhan (Indian Institute of Science, India)
Federated Learning within Global Energy Budget over Heterogeneous Edge Accelerators (abstract)
16:40
Volodia Parol-Guarino (Centre INRIA de l'Université de Rennes, France)
Nikos Parlavantzas (INSA Rennes, France)
Auction-based Placement of Functions in the Fog at Scale (abstract)
17:00
Giuseppe Coviello (NEC Laboratories America, Inc., United States)
Kunal Rao (NEC Laboratories America, Inc., United States)
Mohammad Khojastepour (NEC Laboratories America, United States)
Srimat T. Chakradhar (NEC Labs, United States)
Bifröst: Peer-to-peer Load-balancing for Function Execution in Agentic AI Systems (abstract)
16:00-17:30 Session 17C: Track 4.2: Efficient AI Inference and Model Serving at Scale
16:00
Ao Chen (Institute of Computing Technology, Chinese Academy of Sciences, China)
Guangli Li (Institute of Computing Technology, Chinese Academy of Sciences, China)
Feng Yu (Institute of Computing Technology, Chinese Academy of Sciences, China)
Xueying Wang (Beijing University of Posts and Telecommunications, China)
Jiacheng Zhao (Institute of Computing Technology at Chinese Academy of Sciences, China)
Huimin Cui (Institute of Computing Technology at Chinese Academy of Sciences, China)
Xiaobing Feng (Institute of Computing Technology at Chinese Academy of Sciences, China)
Jingling Xue (The University of New South Wales, Australia)
TopServe: Task-Operator Co-Scheduling for Efficient Multi-DNN Inference Serving on GPUs (abstract)
16:20
Tianyu Guo (Sun Yat-Sen University, China)
Hande Dong (Tencent, China)
Yichong Leng (University of Science and Technology of China, China)
Feng Liu (Tencent, China)
Cheater Lin (Tencent, China)
Nong Xiao (Sun Yat-sen University, China)
Xianwei Zhang (Sun Yat-sen University, China)
EFIM: Efficient Serving of LLMs for Infilling Tasks with Improved KV Cache Reuse (abstract)
16:40
Nicolás Hernández González (Universidad de La Laguna, Spain)
Pedro Antonio Toledo Delgado (Universidad de La Laguna, Spain)
Vicente José Blanco Pérez (Universidad de La Laguna, Spain)
Francisco Carmelo Almeida Rodríguez (Universidad de La Laguna, Spain)
2:4 Pruning on Edge Devices: Performance, Energy Efficiency and Accuracy (abstract)
17:00
Cheng Gu (Shanghai Jiao Tong University, China)
Gang Li (Institute of Automation, Chinese Academy of Sciences, China)
Xuan Zhang (Shanghai Jiao Tong University, China)
Jiayao Ling (Shanghai Jiao Tong University, China)
Xiaolong Lin (Shanghai Jiao Tong University, China)
Zhuoran Song (Shanghai Jiao Tong University, China)
Jian Cheng (Institute of Automation, Chinese Academy of Sciences, China)
Xiaoyao Liang (Shanghai Jiao Tong University, China)
Light-DiT: An Importance-Aware Dynamic Compression Framework for Diffusion Transformers (abstract)
Friday, August 29th

View this program: with abstractssession overviewtalk overview

10:00-10:30Coffee Break
10:30-12:00 Session 19A: Track 2.4: Scheduling, Resource Management, Cloud, Edge Computing, and Workflows
10:30
Yunling Chen (School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China, China)
Qingyin Lin (School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China, China)
Zhitao Chen (School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China, China)
Yang Ou (College of Computer Science and Technology, National University of Defense Technology, Changsha, China, China)
Zhiguang Chen (School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China, China)
DynoInfer: Adaptive Resource Orchestration for LLM Inference on Resource-Constrained PCs (abstract)
10:50
Yunlan Wang (Northwestern Polytechnical University, China)
Yutong Liu (Northwestern Polytechnical University, China)
Jianhua Gu (Northwestern Polytechnical University, China)
Tianhai Zhao (Northwestern Polytechnical University, China)
Zhengxiong Hou (Northwestern Polytechnical University, China)
Chengwen Zhong (Northwestern Polytechnical University, China)
Container Workload Prediction Using Deep Domain Adaptation in Transfer Learning (abstract)
11:10
Sunil Kumar (IIIT-Delhi, India)
Vivek Kumar (IIIT-Delhi, India)
KarmaPM: Reward-Driven Power Manager (abstract)
11:30
Nobel Dhar (Kennesaw State University, United States)
Bobin Deng (Kennesaw State University, United States)
Md Romyull Islam (Kennesaw State University, United States)
Xinyue Zhang (Kennesaw State University, United States)
Kazi Fahim Ahmad Nasif (Kennesaw State University, United States)
Kun Suo (Kennesaw State University, United States)
A Sparsity Predicting Approach for General Large Language Models via Activation Pattern Clustering (abstract)
10:30-12:00 Session 19B: Track 4.3: Distributed systems, Compression, and Federated Applications
10:30
Zhichen Feng (Computer Network Information Center, Chinese Academy of Sciences & University of Chinese Academy of Sciences, China, China)
Xin Zhang (Computer Network Information Center, Chinese Academy of Sciences & University of Chinese Academy of Sciences, China, China)
DiffNO: Neural Operator Learning using Physically Structured Constrained Diffusion Model (abstract)
10:50
Loris Belcastro (University of Calabria, Italy)
Paolo Ferragina (Sant’Anna School of Advanced Studies, Italy)
Giovanni Manzini (University of Pisa, Italy)
Fabrizio Marozzo (DIMES, University of Calabria, Italy)
Domenico Talia (University of Calabria, Italy)
Paolo Trunfio (DEIS, University of Calabria, Italy)
Scalable Compression of Massive Data Collections on HPC Systems (abstract)
11:10
Sabtain Ahmad (Vienna University of Technology, Austria)
Thomas Schneidergruber (Paris-Lodron University of Salzburg, Austria)
Ivona Brandic (Vienna University of Technology, Austria)
Johannes Scholz (Paris-Lodron University of Salzburg, Austria)
On-Device Federated Learning for Remote Alpine Livestock Monitoring (abstract)
11:30
Rubayet Rahman Rongon (Washington State University, United States)
Xuechen Zhang (Washington State University, United States)
IAUG: Accelerating Augmentation with Importance Sampling in Deep Neural Network Training (abstract)
10:30-12:00 Session 19C: Track 5.1: Theory and Algorithms
10:30
Spyros Angelopoulos (CNRS, France)
Loris Marchal (CNRS, France)
Adrien Obrecht (ENS-Lyon, France)
Bertrand Simon (CNRS, France)
Cache Management for Mixture-of-Experts LLMs (abstract)
10:50
Atte Torri (Université Paris-Saclay, LISN, CNRS, France)
Przemyslaw Dominikowski (Université Paris-Saclay, Inria, France)
Brice Pointal (Université Paris-Saclay, LISN, CNRS, France)
Oguz Kaya (Université Paris-Saclay, LISN, CNRS, France)
Laercio Lima Pilla (Université de Bordeaux, CNRS, Bordeaux INP, Inria, LaBRI, France)
Olivier Coulaud (Inria, France)
Near-optimal contraction strategies for the scalar product in the tensor-train format (abstract)
11:10
John Augustine (IIT Madras, India, India)
Christian Scheideler (Paderborn University, Germany, Germany)
Julian Werthmann (Paderborn University, Germany, Germany)
Supervised Distributed Computing (abstract)
11:30
Anne Benoit (ENS Lyon - LIP, France)
Thomas Herault (INRIA, France)
Yves Robert (ENS Lyon, France)
Alix Tremodeux (ENS Lyon, France)
Partial Detectors Versus Replication To Cope With Silent Errors (abstract)
10:30-12:00 Session 19D: Track 6.4: Graph Algorithms and Linear Algebra
10:30
Chao Wang (University of Science and Technology of China, China)
Haijie Hou (University of Science and Technology of China, China)
Longsheng Song (University of Science and Technology of China, China)
Junshi Chen (University of Science and Technology of China, China)
Hong An (University of Science and Technology of China, China)
Dongdong Tan (University of Science and Technology of China, China)
Yueqiang He (University of Science and Technology of China, China)
Sihan Lu (University of Science and Technology of China, China)
Uniform Dense Blocking for Efficient Sparse LU Factorization in First-principles Materials Simulation (abstract)
10:50
Soumyajit Chatterjee (Indian Institute of Technology, Hyderabad, India)
Rahul Utkoor (Qualcomm Innovation Center, Hyderabad, India)
Eshwar Uppu (Indian Institute of Technology, Hyderabad, India)
Sathya Peri (Indian Institute of Technlogy, Hyderabad, India)
Venkata Krishna Nandivada (Indian Institute of Technology, Madras, India)
Efficient Task Graph Scheduling for Parallel QR Factorization in SLSQP (abstract)
11:10
Florian Willich (Humbold Universität zu Berlin, Germany)
Henning Meyerhenke (Humbold Universität zu Berlin, Germany)
ScaleRunner: A Fast MPI-based Random Walk Engine for Multi-CPU Systems (abstract)
12:00-13:30Lunch Break
13:30-14:30 Session 20A: Track 2.5: Scheduling, Resource Management, Cloud, Edge Computing, and Workflows
13:30
Olivier Beaumont (Inria, France)
Raphael Bourgouin (Inria, France)
Maxime Darrin (Mistral AI, France)
Loris Marchal (IRL ILLS (CNRS, McGill, ETS Montreal), Canada)
Pablo Piantanida (IRL ILLS (CNRS, McGill, ETS Montreal), Canada)
Leveraging Expert Usage to Speed up LLM Inference with Expert Parallelism (abstract)
13:50
Ana Gainaru (Oak Ridge National Laboratory, United States)
Scott Klasky (ORNL, United States)
Guillaume Pallez (INRIA, France)
Priority-BF: a Task Manager for Priority-Based Scheduling (abstract)
14:10
Joachim Cendrier (ENS Lyon, France)
Rajini Wijayawardana (University of Chicago, United States)
Anne Benoit (ENS Lyon, France)
Yves Robert (ENS Lyon, France)
Frédéric Vivien (INRIA, France)
Andrew Chien (University of Chicago, United States)
Green Scheduling on the Edge (abstract)
13:30-14:30 Session 20B: Track 5.2: Theory and Algorithms
13:30
Chryssis Georgiou (University of Cyprus, Cyprus)
Piduguralla Manaswini (IIT hyderabad, India)
Sathya Peri (Indian Institute of Technology Hyderabad, India)
Byzantine-Tolerant Consensus in GPU-Inspired Shared Memory (abstract)
13:50
Thomas Koopman (Radboud University, Netherlands)
Sven-Bodo Scholz (Radboud University, Netherlands)
Bernard van Gastel (Radboud University, Netherlands)
Partitioning In-Place on Massively Parallel Systems (abstract)
13:30-14:30 Session 20C: Track 6.5: GPU and Quantum Systems
13:30
Massimiliano Meneghin (Autodesk Research, Italy)
Ahmed Mahmoud (MIT, United States)
Disaggregated Design for GPU-Based Volumetric Data Structures (abstract)
13:50
Jiale Zhang (Jilin university, China)
Xilong Che (Jilin university, China)
Yuzhe Fan (Jilin university, China)
Juncheng Hu (Jilin university, China)
Quantum Delta Encoding: Optimizing Data Storage on Quantum Computers with Resource Efficiency (abstract)
14:10
Yi-Hua Chung (University of Wisconsin at Madison, United States)
Shui Jiang (The Chinese University of Hong Kong, Hong Kong)
Wan Luan Lee (University of Wisconsin at Madison, United States)
Yanqing Zhang (Nvidia Corporation, United States)
Haoxing Ren (Nvidia Corporation, United States)
Tsung-Yi Ho (The Chinese University of Hong Kong, Hong Kong)
Tsung-Wei Huang (University of Wisconsin at Madison, United States)
SimPart: A Simple Yet Effective Replication-aided Partitioning Algorithm for Logic Simulation on GPU (abstract)