Talk Keyword Index

TALK KEYWORD INDEX

This page contains an index consisting of author-provided keywords.

Shortcuts: A B C D E F G H I J K L M N O P Q R S T V W Z

A
a.	Towards High Level Synthesis Support in the European Deep Learning Library
Accelerator computing	Accelerating FFT using NEC SX-Aurora Vector Engine
Accelerators	FleCSI 2.0: The Flexible Computational Science Infrastructure Project
Adaptive Optics	Outsmarting the Atmospheric Turbulence for Ground-Based Telescopes Using the Stochastic Levenberg-Marquardt Method
Address Clustering	Decentralisation Over Privacy: An Analysis Of The Bisq Trade Protocol
address sampling	Low-Overhead Reuse Distance Profiling Tool for Multicore
Affinity	Exploring Strategies to Improve Locality Across Many-core Affinities
AIOps	Parallelizing Automatic Model Management System for AIOps on Microservice Platforms
All-Substrings LCS	A Scalable Approximation Algorithm for Weighted Longest Common Subsequence
AllScale	On using modern C++ and nested recursive task parallelism for HPC applications with AllScale
AMPI	Enabling support for zero copy semantics in an Asynchronous Task-based Programming Model
AMTs	Understanding the Effect of Task Granularity on Execution Time in Asynchronous Many-Task Runtime Systems
Analytical model	Understanding the Effect of Task Granularity on Execution Time in Asynchronous Many-Task Runtime Systems
Anomaly Diagnosis	E2EWatch: An End-to-end Anomaly Diagnosis Framework for Production HPC Systems
Applied Mathematics	FleCSI 2.0: The Flexible Computational Science Infrastructure Project
approximate computing	ALONA: Automatic Loop Nest Approximation with Reconstruction and Space Pruning Model-based Loop Perforation
Approximation algorithm	A log-linear (2+5/6)-approximation algorithm for parallel machine scheduling with a single orthogonal resource
Architecture-aware Pruning	A GPU Architecture Aware Fine-Grain Pruning Technique for Deep Neural Networks
Asymptotic performance	Update on the Asymptotic Optimality of LPT
Asynchronous Parallelism	Particle-In-Cell Simulation using Asynchronous Tasking
Asynchronous task execution	Domain-Specific Runtime to Orchestrate Computation on Heterogeneous Platforms
Asynchronous Tasking	Enabling support for zero copy semantics in an Asynchronous Task-based Programming Model
Asynchrony	The Past, Present, and Future of Asynchrony in C++
Aurora-SX	Accelerating FFT using NEC SX-Aurora Vector Engine
Author keywords:
auto-scheduling	Parallelization and auto-scheduling of data access queries in ML workloads
Auto-tuning	Towards Flexible and Compiler-Friendly Layer Fusion for CNNs on Multicore CPUs
Autonomous control	Continuous Self-Adaptation of Control Policies in Automatic Cloud Management
B
b.	Towards High Level Synthesis Support in the European Deep Learning Library
Benchmark testing	Optimized Implementation of the HPCG Benchmark on Reconfigurable Hardware
benchmarking	Trace-driven Workload Generation and Execution
Benford's Law	Characterizing Memory Failures Using Benford’s Law
Bi-objective optimization	A Novel Bi-Objective Optimization Algorithm on Heterogeneous HPC Platforms for Applications with Continuous Performance and Linear Energy Profiles
Big Data	Big Data and Extreme-Scales: Computational Science in the 21st Century
Bioinformatics	HPC for Bioinformatics: The Genetic Sequence Comparison Quest for Performance
BLAS	OpenMP target task: tasking and target offloading on heterogeneous systems
Blockchain	DoS Attacks on Blockchain Ecosystem Towards a graphical DSL for tracing supply chains on blockchain Integrating Fog Computing and Blockchain Technology for Applications with Enhanced Trust and Privacy Smart Contract Based Public Procurement to Fight Corruption
Blockchain Analysis	Decentralisation Over Privacy: An Analysis Of The Bisq Trade Protocol
breadth-first search	G-Morph: Induced Subgraph Isomorphism Search of Labeled Graphs on a GPU
Buffering	Exploring Strategies to Improve Locality Across Many-core Affinities
Burst buffers	A log-linear (2+5/6)-approximation algorithm for parallel machine scheduling with a single orthogonal resource
C
C++	On using modern C++ and nested recursive task parallelism for HPC applications with AllScale The Past, Present, and Future of Asynchrony in C++
c.	Towards High Level Synthesis Support in the European Deep Learning Library
Cache-Oblivious	Exploring Strategies to Improve Locality Across Many-core Affinities
Capacity Planning	Rafiki: Task-level Capacity Planning in Distributed Stream Processing Systems
ccNUMA	Heterogeneous Voltage Frequency Scaling of Data-Parallel Applications for Energy Saving on Homogeneous Multicore Platforms
CFD simulations	Merging Real Images with Physics Simulations via Data Assimilation
Charm++	Enabling support for zero copy semantics in an Asynchronous Task-based Programming Model
checkpointing	Towards High Performance Resilience using Performance Portable Abstractions
chip multiprocessor	PrioRAT: Criticality-Driven Prioritization Inside the On-Chip Memory Hierarchy
Closing	Closing of the Euro-Par 2021
Cloud	Colony: Parallel Functions as a Service on the Cloud-Edge Continuum Integrating Fog Computing and Blockchain Technology for Applications with Enhanced Trust and Privacy
Cloud Computing	Trace-driven Workload Generation and Execution A Fault Tolerant and Deadline Constrained Sequence Alignment Application on Cloud-based Spot GPU Instances Collaborative, distributed, scalable and low-cost plat-form based on microservices, containers, mobile devices and Cloud services to solve compute-intensive tasks
Cloud Scaling	Horizontal Scaling in Cloud using Contextual Bandits
clustering	GPU Accelerated Mahalanobis-average Hierarchical Clustering Analysis
CNN	Towards Flexible and Compiler-Friendly Layer Fusion for CNNs on Multicore CPUs
Co-execution	Exploiting co-execution with oneAPI: heterogeneity from a modern perspective Feasibility study of Molecular Dynamics kernels exploitation using EngineCL
Collaboration	Geo-Distribute Cloud Applications at the Edge
Collaborative Computing	Collaborative, distributed, scalable and low-cost plat-form based on microservices, containers, mobile devices and Cloud services to solve compute-intensive tasks
Collective behavior	An Automata–based Approach to Proﬁt Optimization of Cloud Brokers in IaaS Environment
communication delays	A Fixed-Parameter Algorithm for Scheduling Unit dependent Tasks with Unit Communication Delays
Communication Optimizations	Enabling support for zero copy semantics in an Asynchronous Task-based Programming Model
Community engagement	SMART: a Tool for Trust and Reputation Management in Social Media
compiler	ALONA: Automatic Loop Nest Approximation with Reconstruction and Space Pruning Model-based Loop Perforation
Complexity	Pipelined Model Parallelism: Complexity Results and Memory Considerations
Computational Astronomy	Outsmarting the Atmospheric Turbulence for Ground-Based Telescopes Using the Stochastic Levenberg-Marquardt Method
computational model	Algorithm design for Tensor Units
Computational Science	FleCSI 2.0: The Flexible Computational Science Infrastructure Project Big Data and Extreme-Scales: Computational Science in the 21st Century
Compute Continuum	Colony: Parallel Functions as a Service on the Cloud-Edge Continuum
Computing clouds	Continuous Self-Adaptation of Control Policies in Automatic Cloud Management
Concurrency	TSLQueue: An Efficient Lock-free Design for Priority Queues
Conference	Welcome
Containers	Collaborative, distributed, scalable and low-cost plat-form based on microservices, containers, mobile devices and Cloud services to solve compute-intensive tasks
Contextual Bandits	Horizontal Scaling in Cloud using Contextual Bandits
Control theory	Sustaining Performance While Reducing Energy Consumption: A Control Theory Approach
Cost distributions	Update on the Asymptotic Optimality of LPT
Coughing and Sneezing simulations	Merging Real Images with Physics Simulations via Data Assimilation
COVID-19	Data management in EpiGraph COVID-19 epidemic simulator
Covid-19 Diffusion	Merging Real Images with Physics Simulations via Data Assimilation
cross-chain communication	Towards A Broadcast Time-Lock Based Token Exchange Protocol
CUDA	GPU Accelerated Mahalanobis-average Hierarchical Clustering Analysis A Low Overhead Tasking Model for OpenMP OpenMP target task: tasking and target offloading on heterogeneous systems
CUDA Graph	An Experimental Study of SYCL Task Graph Parallelism for Large-Scale Machine Learning Workloads
D
data access queries	Parallelization and auto-scheduling of data access queries in ML workloads
Data Analytics	ECP: Data Analytics and Optimization Applications On Accelerator-Based Systems
Data Assimilation	Merging Real Images with Physics Simulations via Data Assimilation
data layout	High Performance Computing with Java Streams
data locality	High Performance Computing with Java Streams
data-parallel applications	Heterogeneous Voltage Frequency Scaling of Data-Parallel Applications for Energy Saving on Homogeneous Multicore Platforms
debug registers	Low-Overhead Reuse Distance Profiling Tool for Multicore
Decentralised Exchange	Decentralisation Over Privacy: An Analysis Of The Bisq Trade Protocol
decentralized exchange	Towards A Broadcast Time-Lock Based Token Exchange Protocol
deep learning	Extracting Information from Large Scale Graph Data: Case Study on Automated UI Testing
Deep neural network	Elastic Deep Learning using Knowledge Distillation with Heterogeneous Computing Resources
Deep Neural Networks	Pipelined Model Parallelism: Complexity Results and Memory Considerations
Deep Reinforcement Learning	Continuous Self-Adaptation of Control Policies in Automatic Cloud Management
Denial-of-service	DoS Attacks on Blockchain Ecosystem
Deployment	E2EWatch: An End-to-end Anomaly Diagnosis Framework for Production HPC Systems
design patterns	RDPM: An Extensible Tool for Resilience Design Patterns Modeling
Digital Twin	Continuous Self-Adaptation of Control Policies in Automatic Cloud Management
Directives	Enhancing Load-Balancing of MPI Applications with Workshare
DISSECT-CF	Towards Generating Realistic Trace for Simulating Functions-as-a-Service
Distributed Algorithm	Efficient and Systematic Partitioning of Large and Deep Neural Networks for Parallelization
Distributed computing	Elastic Deep Learning using Knowledge Distillation with Heterogeneous Computing Resources
distributed e-business workflows	Extracting Information from Large Scale Graph Data: Case Study on Automated UI Testing
Distributed Machine Learning	Efficient and Systematic Partitioning of Large and Deep Neural Networks for Parallelization
Distributed memory systems	Communication overlapping Pipelined Conjugate Gradients for Distributed Memory Systems and Heterogeneous Architectures
Distributed Shared Memory	Data management model to program irregular compute kernels on FPGA: application to heterogeneous distributed system
Distributed Stream Processing	Rafiki: Task-level Capacity Planning in Distributed Stream Processing Systems
Distributed Systems	Colony: Parallel Functions as a Service on the Cloud-Edge Continuum Extracting Information from Large Scale Graph Data: Case Study on Automated UI Testing
DNN	A GPU Architecture Aware Fine-Grain Pruning Technique for Deep Neural Networks
Domain decomposition	Scalable hybrid parallel ILU preconditioner to solve sparse linear systems
Domain Specific Language	Geo-Distribute Cloud Applications at the Edge
Domain-specific runtime	Domain-Specific Runtime to Orchestrate Computation on Heterogeneous Platforms
DPC++	Exploiting co-execution with oneAPI: heterogeneity from a modern perspective
E
EASY backfilling	Plan-based Job Scheduling for Supercomputers with Shared Burst Buffers
Edge	Colony: Parallel Functions as a Service on the Cloud-Edge Continuum
Edge computing	Geo-Distribute Cloud Applications at the Edge
efficient algorithms	Algorithm design for Tensor Units
Emerging hardware	SparCity: Optimizing Sparse Computation and Graphs for Novel Parallel Architectures
Empirical evaluation	Update on the Asymptotic Optimality of LPT
emulation	Trace-driven Workload Generation and Execution
Energy Efficiency	Energy-Efficient Execution of Streaming Task Graphs with Parallelizable Tasks on Multicore Platforms with Core Failures
Energy optimization	A Novel Bi-Objective Optimization Algorithm on Heterogeneous HPC Platforms for Applications with Continuous Performance and Linear Energy Profiles
energy saving	Heterogeneous Voltage Frequency Scaling of Data-Parallel Applications for Energy Saving on Homogeneous Multicore Platforms
Epidemiological simulation	Data management in EpiGraph COVID-19 epidemic simulator
Euro-Par 2021	Closing of the Euro-Par 2021
Euro-Par 2022	Presenting the Euro-Par 2022
Eviction policy	Locality-Aware Scheduling of Independent Tasks for Runtime Systems
Exascale Computing Project	ECP: Data Analytics and Optimization Applications On Accelerator-Based Systems
External trees	TSLQueue: An Efficient Lock-free Design for Priority Queues
Extreme-Scales	Big Data and Extreme-Scales: Computational Science in the 21st Century
F
FaaS	Colony: Parallel Functions as a Service on the Cloud-Edge Continuum Towards Generating Realistic Trace for Simulating Functions-as-a-Service
Factorized Sparse Approximate Inverse	Mixed Precision Incomplete and Factorized Sparse Approximate Inverse Preconditioning on GPUs
Failure Characterization	Characterizing Memory Failures Using Benford’s Law
Failure Distribution	Characterizing Memory Failures Using Benford’s Law
Farewell	Closing of the Euro-Par 2021
fault tolerance	Fault-tolerant LU factorisation is low cost Towards High Performance Resilience using Performance Portable Abstractions Application-Based Fault Tolerance for Numerical Linear Algebra at Large Scale
Fault-aware resource allocation	Exploring the impact of node failures on the resource allocation for parallel jobs
Fault-tolerant Execution	Energy-Efficient Execution of Streaming Task Graphs with Parallelizable Tasks on Multicore Platforms with Core Failures
FFT	Exploring Strategies to Improve Locality Across Many-core Affinities Accelerating FFT using NEC SX-Aurora Vector Engine
Field Programmable Gate Array	Data management model to program irregular compute kernels on FPGA: application to heterogeneous distributed system
Fine-grain Parallelism	A Low Overhead Tasking Model for OpenMP
fixed-parameter algorithm	A Fixed-Parameter Algorithm for Scheduling Unit dependent Tasks with Unit Communication Delays
Fog	Colony: Parallel Functions as a Service on the Cloud-Edge Continuum Integrating Fog Computing and Blockchain Technology for Applications with Enhanced Trust and Privacy
frequency scaling	Heterogeneous Voltage Frequency Scaling of Data-Parallel Applications for Energy Saving on Homogeneous Multicore Platforms
G
GAN	Extracting Information from Large Scale Graph Data: Case Study on Automated UI Testing
GASPI	Scalable hybrid parallel ILU preconditioner to solve sparse linear systems
Genetic Sequence Comparison	HPC for Bioinformatics: The Genetic Sequence Comparison Quest for Performance
Ginkgo	Porting Sparse Linear Algebra to Intel GPUs
Glasgow	Presenting the Euro-Par 2022
GPU	GPU Accelerated Mahalanobis-average Hierarchical Clustering Analysis A GPU Architecture Aware Fine-Grain Pruning Technique for Deep Neural Networks Collaborative GPU Preemption via Spatial Multitasking for Efficient GPU Sharing Model-based Loop Perforation A Low Overhead Tasking Model for OpenMP Towards an efficient sparse storage format for the SpMM kernel in GPUs
GPU Computing	Outsmarting the Atmospheric Turbulence for Ground-Based Telescopes Using the Stochastic Levenberg-Marquardt Method
GPU preemption	Collaborative GPU Preemption via Spatial Multitasking for Efficient GPU Sharing
GPU programming	G-Morph: Induced Subgraph Isomorphism Search of Labeled Graphs on a GPU
GPU sharing	Collaborative GPU Preemption via Spatial Multitasking for Efficient GPU Sharing
Graph AI	Knowledge Graphs, Graph AI, and the Need for High-performance Graph Computing
graph databases	G-Morph: Induced Subgraph Isomorphism Search of Labeled Graphs on a GPU
graph problems	Algorithm design for Tensor Units
Graph Processing	Accelerating Graph Applications Using Phased Transactional Memory
Graphical DSL	Towards a graphical DSL for tracing supply chains on blockchain
Graphs	SparCity: Optimizing Sparse Computation and Graphs for Novel Parallel Architectures
H
hardware accelerators	Algorithm design for Tensor Units
hardware performance counters	Low-Overhead Reuse Distance Profiling Tool for Multicore
Hardware Transactional Memory	Accelerating Graph Applications Using Phased Transactional Memory
Heterogeneity	FleCSI 2.0: The Flexible Computational Science Infrastructure Project OpenMP target task: tasking and target offloading on heterogeneous systems
Heterogeneous architectures	Communication overlapping Pipelined Conjugate Gradients for Distributed Memory Systems and Heterogeneous Architectures
Heterogeneous computing	Exploiting co-execution with oneAPI: heterogeneity from a modern perspective Efficient GPU Computation using Task Graph Parallelism Kernel Fusion in OpenCL Feasibility study of Molecular Dynamics kernels exploitation using EngineCL
Heterogeneous data processing	Data management in EpiGraph COVID-19 epidemic simulator
Heterogeneous platforms	Domain-Specific Runtime to Orchestrate Computation on Heterogeneous Platforms
heterogeneous voltage	Heterogeneous Voltage Frequency Scaling of Data-Parallel Applications for Energy Saving on Homogeneous Multicore Platforms
hierarchical architectures	An MPI-Parallel Algorithm for Mapping Complex Networks onto Hierarchical Architectures
High performance clouds	Network SLO for High Performance Clouds
High Performance computing	Optimized Implementation of the HPCG Benchmark on Reconfigurable Hardware Application-Based Fault Tolerance for Numerical Linear Algebra at Large Scale
High performance computing (HPC)	Plan-based Job Scheduling for Supercomputers with Shared Burst Buffers
High Performance Conjugate Gradient	Optimized Implementation of the HPCG Benchmark on Reconfigurable Hardware
high-dimensional	GPU Accelerated Mahalanobis-average Hierarchical Clustering Analysis
high-performance computing	Fault-tolerant LU factorisation is low cost RDPM: An Extensible Tool for Resilience Design Patterns Modeling
High-performance Graph Computing	Knowledge Graphs, Graph AI, and the Need for High-performance Graph Computing
HPC	Exploiting co-execution with oneAPI: heterogeneity from a modern perspective Enhancing Load-Balancing of MPI Applications with Workshare E2EWatch: An End-to-end Anomaly Diagnosis Framework for Production HPC Systems Automatic low-overhead load-imbalance detection in MPI applications Interferences between Communications and Computations in Distributed HPC Systems HPC for Bioinformatics: The Genetic Sequence Comparison Quest for Performance
HPC Systems	Sustaining Performance While Reducing Energy Consumption: A Control Theory Approach
HPCG	Optimized Implementation of the HPCG Benchmark on Reconfigurable Hardware
HPX	Understanding the Effect of Task Granularity on Execution Time in Asynchronous Many-Task Runtime Systems
Hybrid parallel computing	Feasibility study of Molecular Dynamics kernels exploitation using EngineCL
I
IaaS	SPIRIT: A microservice-based framework for interactive Cloud infrastructure planning
IaC	SPIRIT: A microservice-based framework for interactive Cloud infrastructure planning
IC-PCP	SPIRIT: A microservice-based framework for interactive Cloud infrastructure planning
Impact of node failures on MPI parallel jobs	Exploring the impact of node failures on the resource allocation for parallel jobs
Incomplete Sparse Approximate Inverse	Mixed Precision Incomplete and Factorized Sparse Approximate Inverse Preconditioning on GPUs
Intel GPUs	Porting Sparse Linear Algebra to Intel GPUs
Inter-GPU communication	Monitoring Collective Communication Among GPUs
Irregular application	Data management model to program irregular compute kernels on FPGA: application to heterogeneous distributed system
J
Java parallel streams	High Performance Computing with Java Streams
K
Kernel Fusion	Kernel Fusion in OpenCL
kernel perforation	Model-based Loop Perforation
Key-Value Store	Taming Tail Latency in Key-Value Stores: a Scheduling Perspective
Keyword 1	European Processor Initiative and EU Projects Towards Exascale Computing EPEEC: Europe toward High Coding Productivity for Exascale
Keyword 2	European Processor Initiative and EU Projects Towards Exascale Computing EPEEC: Europe toward High Coding Productivity for Exascale
Keyword 3	European Processor Initiative and EU Projects Towards Exascale Computing EPEEC: Europe toward High Coding Productivity for Exascale
Keyword1	Data-Centric Python - Productivity, portability and all with high performance!
Keyword2	Data-Centric Python - Productivity, portability and all with high performance!
Keyword3	Data-Centric Python - Productivity, portability and all with high performance!
Knowledge distillation	Elastic Deep Learning using Knowledge Distillation with Heterogeneous Computing Resources
Knowledge Graphs	Knowledge Graphs, Graph AI, and the Need for High-performance Graph Computing
Kubernetes	Collaborative, distributed, scalable and low-cost plat-form based on microservices, containers, mobile devices and Cloud services to solve compute-intensive tasks
L
large scale graph data	Extracting Information from Large Scale Graph Data: Case Study on Automated UI Testing
Lattice Boltzmann method	Designing a 3D Parallel Memory-Aware Lattice Boltzmann Algorithm on Manycore Systems
Layer fusion	Towards Flexible and Compiler-Friendly Layer Fusion for CNNs on Multicore CPUs
LCS	A Scalable Approximation Algorithm for Weighted Longest Common Subsequence
linear algebra	Fault-tolerant LU factorisation is low cost Algorithm design for Tensor Units Application-Based Fault Tolerance for Numerical Linear Algebra at Large Scale OpenMP target task: tasking and target offloading on heterogeneous systems
Linked lists	TSLQueue: An Efficient Lock-free Design for Priority Queues
Lisbon	Welcome
llvm	ALONA: Automatic Loop Nest Approximation with Reconstruction and Space Pruning
load balancing	Exploiting co-execution with oneAPI: heterogeneity from a modern perspective An MPI-Parallel Algorithm for Mapping Complex Networks onto Hierarchical Architectures Feasibility study of Molecular Dynamics kernels exploitation using EngineCL
Load Imbalance	Automatic low-overhead load-imbalance detection in MPI applications
Load-balancing	Enhancing Load-Balancing of MPI Applications with Workshare
Lock-freedom	TSLQueue: An Efficient Lock-free Design for Priority Queues
Longest Processing Time (LPT) heuristic	Update on the Asymptotic Optimality of LPT
loop optimization	Model-based Loop Perforation
Lower Bound	Taming Tail Latency in Key-Value Stores: a Scheduling Perspective
LU	Fault-tolerant LU factorisation is low cost Application-Based Fault Tolerance for Numerical Linear Algebra at Large Scale
M
Machine Learning	E2EWatch: An End-to-end Anomaly Diagnosis Framework for Production HPC Systems Parallelization and auto-scheduling of data access queries in ML workloads
Mahalanobis distance	GPU Accelerated Mahalanobis-average Hierarchical Clustering Analysis
makespan	A Fixed-Parameter Algorithm for Scheduling Unit dependent Tasks with Unit Communication Delays
Manycore Parallelism	Particle-In-Cell Simulation using Asynchronous Tasking
manycore systems	Designing a 3D Parallel Memory-Aware Lattice Boltzmann Algorithm on Manycore Systems
Math library	Porting Sparse Linear Algebra to Intel GPUs
matrix factorizations	Fault-tolerant LU factorisation is low cost Application-Based Fault Tolerance for Numerical Linear Algebra at Large Scale
Memory Contention	Interferences between Communications and Computations in Distributed HPC Systems
memory hierarchy	PrioRAT: Criticality-Driven Prioritization Inside the On-Chip Memory Hierarchy
memory traffic prioritization	PrioRAT: Criticality-Driven Prioritization Inside the On-Chip Memory Hierarchy
memory-aware algorithms	Designing a 3D Parallel Memory-Aware Lattice Boltzmann Algorithm on Manycore Systems
Memory-aware scheduling	Locality-Aware Scheduling of Independent Tasks for Runtime Systems
METIS	Scalable hybrid parallel ILU preconditioner to solve sparse linear systems
microservice	SPIRIT: A microservice-based framework for interactive Cloud infrastructure planning Parallelizing Automatic Model Management System for AIOps on Microservice Platforms
Microservices	Geo-Distribute Cloud Applications at the Edge
Min-max optimization	A Novel Bi-Objective Optimization Algorithm on Heterogeneous HPC Platforms for Applications with Continuous Performance and Linear Energy Profiles
Min-sum optimization	A Novel Bi-Objective Optimization Algorithm on Heterogeneous HPC Platforms for Applications with Continuous Performance and Linear Energy Profiles
Mixed Precision	Mixed Precision Incomplete and Factorized Sparse Approximate Inverse Preconditioning on GPUs
MLOps	Parallelizing Automatic Model Management System for AIOps on Microservice Platforms
Mobile Computing	Collaborative, distributed, scalable and low-cost plat-form based on microservices, containers, mobile devices and Cloud services to solve compute-intensive tasks
Model management pipeline	Parallelizing Automatic Model Management System for AIOps on Microservice Platforms
model parallelism	Pipelined Model Parallelism: Complexity Results and Memory Considerations Memory Efficient Deep Neural Network Training
Molecular dynamics	Feasibility study of Molecular Dynamics kernels exploitation using EngineCL
MPI	Enhancing Load-Balancing of MPI Applications with Workshare Interferences between Communications and Computations in Distributed HPC Systems
Multi-agent systems	An Automata–based Approach to Proﬁt Optimization of Cloud Brokers in IaaS Environment
Multi-GPUs	Monitoring Collective Communication Among GPUs
multi-resource scheduling	Plan-based Job Scheduling for Supercomputers with Shared Burst Buffers
multicore	Heterogeneous Voltage Frequency Scaling of Data-Parallel Applications for Energy Saving on Homogeneous Multicore Platforms
Multicore CPUs	Towards Flexible and Compiler-Friendly Layer Fusion for CNNs on Multicore CPUs
Multilevel Memory	Exploring Strategies to Improve Locality Across Many-core Affinities
Multiphysics	FleCSI 2.0: The Flexible Computational Science Infrastructure Project
N
NEC	Accelerating FFT using NEC SX-Aurora Vector Engine
Network performance	Network SLO for High Performance Clouds
Neural Network Partitioning	Efficient and Systematic Partitioning of Large and Deep Neural Networks for Parallelization
neural networks	Memory Efficient Deep Neural Network Training
Non-Interactive	DoS Attacks on Blockchain Ecosystem
Non-linear Optimization Problems	Outsmarting the Atmospheric Turbulence for Ground-Based Telescopes Using the Stochastic Levenberg-Marquardt Method
nonvolatile memory	Plan-based Job Scheduling for Supercomputers with Shared Burst Buffers
Numerical Linear Algebra	Mixed Precision Incomplete and Factorized Sparse Approximate Inverse Preconditioning on GPUs
O
offloading	Memory Efficient Deep Neural Network Training
oneAPI	Exploiting co-execution with oneAPI: heterogeneity from a modern perspective Porting Sparse Linear Algebra to Intel GPUs
online job scheduling	Plan-based Job Scheduling for Supercomputers with Shared Burst Buffers
Online Scheduling	Taming Tail Latency in Key-Value Stores: a Scheduling Perspective
OpenCL	Kernel Fusion in OpenCL Feasibility study of Molecular Dynamics kernels exploitation using EngineCL
OpenMP	A Low Overhead Tasking Model for OpenMP OpenMP target task: tasking and target offloading on heterogeneous systems
Optimization	ECP: Data Analytics and Optimization Applications On Accelerator-Based Systems
Orthogonal resource	A log-linear (2+5/6)-approximation algorithm for parallel machine scheduling with a single orthogonal resource
Overlapping communication and computations	Communication overlapping Pipelined Conjugate Gradients for Distributed Memory Systems and Heterogeneous Architectures
P
parallel	GPU Accelerated Mahalanobis-average Hierarchical Clustering Analysis
Parallel Algorithms	A Scalable Approximation Algorithm for Weighted Longest Common Subsequence
Parallel Approximation Algorithms	A Scalable Approximation Algorithm for Weighted Longest Common Subsequence
parallel computing	Mixed Precision Incomplete and Factorized Sparse Approximate Inverse Preconditioning on GPUs Exploiting co-execution with oneAPI: heterogeneity from a modern perspective G-Morph: Induced Subgraph Isomorphism Search of Labeled Graphs on a GPU
Parallel Dynamic Programming	A Scalable Approximation Algorithm for Weighted Longest Common Subsequence
Parallel ILU preconditioner	Scalable hybrid parallel ILU preconditioner to solve sparse linear systems
parallel label propagation	An MPI-Parallel Algorithm for Mapping Complex Networks onto Hierarchical Architectures
Parallel machines	A log-linear (2+5/6)-approximation algorithm for parallel machine scheduling with a single orthogonal resource
Parallel model training	Parallelizing Automatic Model Management System for AIOps on Microservice Platforms
parallel numerical methods	Designing a 3D Parallel Memory-Aware Lattice Boltzmann Algorithm on Manycore Systems
Parallel Programming	Enabling support for zero copy semantics in an Asynchronous Task-based Programming Model
Parallel String Comparison	A Scalable Approximation Algorithm for Weighted Longest Common Subsequence
Parallel tool	Data management in EpiGraph COVID-19 epidemic simulator
Parallelism	The Past, Present, and Future of Asynchrony in C++
Parallelization	Rafiki: Task-level Capacity Planning in Distributed Stream Processing Systems
Particle simulations	Feasibility study of Molecular Dynamics kernels exploitation using EngineCL
Particle-In-Cell (PIC)	Particle-In-Cell Simulation using Asynchronous Tasking
Performance	Smart Distributed DataSets for Stream Processing
Performance Analysis	Automatic low-overhead load-imbalance detection in MPI applications
Performance Modeling	Rafiki: Task-level Capacity Planning in Distributed Stream Processing Systems
Performance optimization	A Novel Bi-Objective Optimization Algorithm on Heterogeneous HPC Platforms for Applications with Continuous Performance and Linear Energy Profiles
Performance portability	Towards High Performance Resilience using Performance Portable Abstractions FleCSI 2.0: The Flexible Computational Science Infrastructure Project Domain-Specific Runtime to Orchestrate Computation on Heterogeneous Platforms Feasibility study of Molecular Dynamics kernels exploitation using EngineCL
Performance scalability	TSLQueue: An Efficient Lock-free Design for Priority Queues
Pipelining	Communication overlapping Pipelined Conjugate Gradients for Distributed Memory Systems and Heterogeneous Architectures
Plasma Simulation	Particle-In-Cell Simulation using Asynchronous Tasking
polyhedral model	ALONA: Automatic Loop Nest Approximation with Reconstruction and Space Pruning
Power regulation	Sustaining Performance While Reducing Energy Consumption: A Control Theory Approach
Preconditioned Conjugate Gradient	Communication overlapping Pipelined Conjugate Gradients for Distributed Memory Systems and Heterogeneous Architectures
Preconditioning	Mixed Precision Incomplete and Factorized Sparse Approximate Inverse Preconditioning on GPUs
Priority queues	TSLQueue: An Efficient Lock-free Design for Priority Queues
Privacy	Decentralisation Over Privacy: An Analysis Of The Bisq Trade Protocol
process mapping	An MPI-Parallel Algorithm for Mapping Complex Networks onto Hierarchical Architectures
Profiling	Rafiki: Task-level Capacity Planning in Distributed Stream Processing Systems Monitoring Collective Communication Among GPUs
Programming model	Colony: Parallel Functions as a Service on the Cloud-Edge Continuum
programming models	Towards High Performance Resilience using Performance Portable Abstractions
public procurement	Smart Contract Based Public Procurement to Fight Corruption
Q
Quality of Service	Rafiki: Task-level Capacity Planning in Distributed Stream Processing Systems
R
RDMA	Enabling support for zero copy semantics in an Asynchronous Task-based Programming Model
Real-Time Processing	Outsmarting the Atmospheric Turbulence for Ground-Based Telescopes Using the Stochastic Levenberg-Marquardt Method
Reconfigurable architectures	Optimized Implementation of the HPCG Benchmark on Reconfigurable Hardware
Reinforcement Learning	Horizontal Scaling in Cloud using Contextual Bandits
Reliability	Faults, Errors and Failures in Extreme-Scale Supercomputers
rematerialization	Memory Efficient Deep Neural Network Training
Replica Selection	Taming Tail Latency in Key-Value Stores: a Scheduling Perspective
Replication	Geo-Distribute Cloud Applications at the Edge
Reputation	SMART: a Tool for Trust and Reputation Management in Social Media
resilience	Towards High Performance Resilience using Performance Portable Abstractions RDPM: An Extensible Tool for Resilience Design Patterns Modeling Faults, Errors and Failures in Extreme-Scale Supercomputers
Resource Management	Smart Distributed DataSets for Stream Processing
Resource Optimization	Rafiki: Task-level Capacity Planning in Distributed Stream Processing Systems
Resource sharing	Geo-Distribute Cloud Applications at the Edge
reuse distance	Low-Overhead Reuse Distance Profiling Tool for Multicore
Rigid jobs	A log-linear (2+5/6)-approximation algorithm for parallel machine scheduling with a single orthogonal resource
runtime system	PrioRAT: Criticality-Driven Prioritization Inside the On-Chip Memory Hierarchy Domain-Specific Runtime to Orchestrate Computation on Heterogeneous Platforms
Runtime systems	Locality-Aware Scheduling of Independent Tasks for Runtime Systems
S
SABNAtk	Parallelization and auto-scheduling of data access queries in ML workloads
Scalability	Smart Distributed DataSets for Stream Processing
scheduling	Exploiting co-execution with oneAPI: heterogeneity from a modern perspective Pipelined Model Parallelism: Complexity Results and Memory Considerations A Fixed-Parameter Algorithm for Scheduling Unit dependent Tasks with Unit Communication Delays Memory Efficient Deep Neural Network Training
Scheduling and optimization	Efficient GPU Computation using Task Graph Parallelism
Score-P	Automatic low-overhead load-imbalance detection in MPI applications
Sequence Alignment	A Fault Tolerant and Deadline Constrained Sequence Alignment Application on Cloud-based Spot GPU Instances
Serverless	Colony: Parallel Functions as a Service on the Cloud-Edge Continuum
Serverless computing	Towards Generating Realistic Trace for Simulating Functions-as-a-Service
Serverless trace	Towards Generating Realistic Trace for Simulating Functions-as-a-Service
Serverless workload	Towards Generating Realistic Trace for Simulating Functions-as-a-Service
Service level objective	Network SLO for High Performance Clouds
Service mesh	Geo-Distribute Cloud Applications at the Edge
Shared data-structures	TSLQueue: An Efficient Lock-free Design for Priority Queues
Sigmoid model	SMART: a Tool for Trust and Reputation Management in Social Media
simulated annealing	Plan-based Job Scheduling for Supercomputers with Shared Burst Buffers
simulation	Plan-based Job Scheduling for Supercomputers with Shared Burst Buffers
Skip lists	TSLQueue: An Efficient Lock-free Design for Priority Queues
Smart Contract	Towards a graphical DSL for tracing supply chains on blockchain
smart contracts	Smart Contract Based Public Procurement to Fight Corruption
Social media	SMART: a Tool for Trust and Reputation Management in Social Media
Software Transactional Memory	Accelerating Graph Applications Using Phased Transactional Memory
Sparse computation	SparCity: Optimizing Sparse Computation and Graphs for Novel Parallel Architectures
sparse format	Towards an efficient sparse storage format for the SpMM kernel in GPUs
Sparse Linear Algebra	Mixed Precision Incomplete and Factorized Sparse Approximate Inverse Preconditioning on GPUs
Sparse linear systems	Scalable hybrid parallel ILU preconditioner to solve sparse linear systems
Sparse matrix matrix multiplication	Towards an efficient sparse storage format for the SpMM kernel in GPUs
Spatial prisoner’s dilemma game	An Automata–based Approach to Proﬁt Optimization of Cloud Brokers in IaaS Environment
SpMV	Porting Sparse Linear Algebra to Intel GPUs
Spot GPU	A Fault Tolerant and Deadline Constrained Sequence Alignment Application on Cloud-based Spot GPU Instances
Stochastic Levenberg-Marquardt	Outsmarting the Atmospheric Turbulence for Ground-Based Telescopes Using the Stochastic Levenberg-Marquardt Method
Stream	Colony: Parallel Functions as a Service on the Cloud-Edge Continuum
Stream Processing	Smart Distributed DataSets for Stream Processing
String Comparison	A Scalable Approximation Algorithm for Weighted Longest Common Subsequence
subgraph isomorphism	G-Morph: Induced Subgraph Isomorphism Search of Labeled Graphs on a GPU
Supercomputers	Faults, Errors and Failures in Extreme-Scale Supercomputers
Supply chain management	Towards a graphical DSL for tracing supply chains on blockchain
SYCL	Exploiting co-execution with oneAPI: heterogeneity from a modern perspective An Experimental Study of SYCL Task Graph Parallelism for Large-Scale Machine Learning Workloads
Synthetic node failure trace generation	Exploring the impact of node failures on the resource allocation for parallel jobs
T
Tail Latency	Taming Tail Latency in Key-Value Stores: a Scheduling Perspective
Task granularity	Understanding the Effect of Task Granularity on Execution Time in Asynchronous Many-Task Runtime Systems
task graph parallelism	Efficient GPU Computation using Task Graph Parallelism An Experimental Study of SYCL Task Graph Parallelism for Large-Scale Machine Learning Workloads
Task parallelism	On using modern C++ and nested recursive task parallelism for HPC applications with AllScale
Task Scheduling	Energy-Efficient Execution of Streaming Task Graphs with Parallelizable Tasks on Multicore Platforms with Core Failures
Task-based Programming	Particle-In-Cell Simulation using Asynchronous Tasking
task-based programming model	PrioRAT: Criticality-Driven Prioritization Inside the On-Chip Memory Hierarchy
Task-Based Runtimes	FleCSI 2.0: The Flexible Computational Science Infrastructure Project
Task-level parallelism	Scalable hybrid parallel ILU preconditioner to solve sparse linear systems
Tasking	OpenMP target task: tasking and target offloading on heterogeneous systems
Tasks sharing data	Locality-Aware Scheduling of Independent Tasks for Runtime Systems
Telemetry	E2EWatch: An End-to-end Anomaly Diagnosis Framework for Production HPC Systems
Tensor core	Algorithm design for Tensor Units
time-lock puzzle	Towards A Broadcast Time-Lock Based Token Exchange Protocol
tool	RDPM: An Extensible Tool for Resilience Design Patterns Modeling
trace	Trace-driven Workload Generation and Execution
TRSM	OpenMP target task: tasking and target offloading on heterogeneous systems
Trust	SMART: a Tool for Trust and Reputation Management in Social Media
V
Vector computing	Accelerating FFT using NEC SX-Aurora Vector Engine
Verifiable Delay Function	DoS Attacks on Blockchain Ecosystem
Video Transcoding	Collaborative, distributed, scalable and low-cost plat-form based on microservices, containers, mobile devices and Cloud services to solve compute-intensive tasks
virtual infrastructure planning	SPIRIT: A microservice-based framework for interactive Cloud infrastructure planning
W
Weighted All-Substrings LCS	A Scalable Approximation Algorithm for Weighted Longest Common Subsequence
Weighted LCS	A Scalable Approximation Algorithm for Weighted Longest Common Subsequence
Welcome	Welcome Presenting the Euro-Par 2022
workflow	SPIRIT: A microservice-based framework for interactive Cloud infrastructure planning
workload	Trace-driven Workload Generation and Execution
Workshare	Enhancing Load-Balancing of MPI Applications with Workshare
Z
Zero Copy	Enabling support for zero copy semantics in an Asynchronous Task-based Programming Model
Zero Copy API	Enabling support for zero copy semantics in an Asynchronous Task-based Programming Model