TALK KEYWORD INDEX
This page contains an index consisting of author-provided keywords.
3 | |
3D reconstruction | |
A | |
Activation Sparsity | |
actor-based concurrency | |
Adaptive scheduling | |
agentic-ai | |
AI | |
AI Accelerators | |
AIPC | |
Algorithmic Skeletons | |
AlphaTensor | |
Analog in-memory computing | |
analytics | |
Apache Spark | |
Application malleability | |
Approximate Computing | |
Approximation ratio | |
ARM SVE | |
Articulation points | |
Asynchronous | |
Asynchronous Data Processing | |
Asynchronous Programming | |
auction | |
Auto-scaling | |
auto-scheduling | |
Autonomy Loops | |
Autoscaling | |
B | |
Batch processing | |
Benchmark Suite | |
benchmark suite optimization | |
Benchmarking | |
Benchmarks | |
Biconnected components | |
Big Data | |
Bioinformatics | |
bottleneck detection | |
Branch prediction | |
Byzantine failures | |
C | |
Cache | |
Cache management | |
Caching / Paging | |
Caching update | |
carbon cost | |
Carbon emissions | |
CAS | |
ccNUMA | |
CFD | |
Chapel | |
Checkpointing | |
Cloud | |
cloud computing | |
Cloud computing | |
Cloud Continuum | |
Cloud Robotics | |
cloud-to-thing | |
Clouds | |
CloudSim | |
Clustering | |
CNN Inference | |
Co-Design Framework | |
co-running | |
Code generation | |
collaborative system of systems | |
Collective Algorithms | |
Collective communication | |
Competitive analysis | |
compilation | |
Compiler Optimization | |
Composability | |
Compressed Communication | |
Computational Efficiency | |
computational storage device | |
computational workflows | |
compute | |
Compute Express Link (CXL) | |
Computed Tomography | |
Computer architecture | |
Computing Continuum | |
Concurrent kernel execution | |
Consensus | |
Continuous Profiling | |
continuum | |
Convolutional Neural Networks | |
Convolutional Neural Networks (CNN) | |
counting queries | |
CPU | |
CPU utilization | |
Critical Path | |
cross-facility workflows | |
CUDA | |
CUDA/HIP | |
Cut vertices | |
D | |
DAG | |
Data Analysis of Scientific Computing | |
Data Augmentation | |
Data Center | |
Data Classification | |
Data Compression | |
data logistic | |
Data preprocessing | |
data redistribution | |
data streaming | |
Data structure | |
Data-intensive applications | |
database | |
datacenter | |
Dataflow Optimization | |
dataflow programming | |
Decentralized | |
Decentralized Federated Learning | |
Decentralized Systems | |
Decoupled AllReduce | |
deep domain adaptation | |
Deep Learning | |
Deep Learning Serving Systems | |
Deep Neural Network | |
dense vectors | |
Dependency Flagging System | |
Dependency-aware Transaction Processing | |
Device Heterogeneity | |
DevOps | |
Diffusion Model | |
Diffusion Model Accelerator | |
Diffusion Transformers | |
Digital Twin | |
Directed Acyclic Task Graph (DATG) scheduling QR factorization | |
Disaggregated memory | |
Distributed Computing | |
Distributed computing | |
Distributed deep learning | |
Distributed Dense Linear Algebra | |
Distributed Sparse Linear Algebra | |
Distributed Systems | |
Distributed training | |
Distributed Training and Inference | |
distributed workflows | |
distributed-computing | |
DMA | |
DNN Training | |
Docker containers | |
DPDK | |
DSL | |
Dual-Cache | |
Dynamic caching method | |
Dynamic Graphs | |
Dynamic programming | |
Dynamic Resource Allocation | |
dynamic resource allocation | |
Dynamic Resource Management | |
Dynamic resource management | |
Dynamic Resources | |
E | |
eBPF | |
Edge Accelerators | |
Edge AI | |
Edge computing | |
Edge Network | |
edge platform | |
Edge-AI | |
Edge-Cloud Continuum | |
Education | |
Efficiency | |
Efficient Inference on Local platforms | |
Elastic Computing | |
Elastic HPC | |
Electronic Design Automation | |
Elixir | |
Embedding Table | |
Emerging Memory System | |
Empirical Comparison | |
energy awareness | |
Energy consumption | |
Energy Efficiency | |
Energy measurement | |
energy performance | |
Energy-Aware 3D Gaussian Splatting | |
energy-aware algorithms | |
Energy-Aware Scheduling | |
Energy-aware software engineering | |
Ethernet | |
Evolutionary computation | |
evolving applications | |
F | |
FaaS | |
Fault Tolerance | |
Fault-Free | |
Federated Learning | |
Federated Learning | |
FIM | |
First-principles materials simulation | |
Floating-Point Non-Associatvity | |
Flooding | |
Flowshop Scheduling | |
fog | |
FPGA | |
FPGA Accelerator | |
FPGA Demonstrator | |
FPGAs | |
function-as-a-service | |
Functional array languages | |
G | |
GANs | |
garbage collection | |
Gate sizing | |
Generate code | |
Generative Adversarial Networks | |
Genomics | |
Gigapixel Images | |
GPU | |
GPU Acceleration | |
GPU allocation | |
GPU architectures | |
GPU cache management | |
GPU code generation | |
GPU Computing | |
GPU Computing | |
GPU parallel | |
GPU power modeling | |
GPU programming | |
GPU scheduling | |
GPUs | |
Grace Hopper | |
Gradient Compression | |
Graph Neural Network (GNN) | |
Graph Neural Networks | |
graph partitioning | |
Graph Processing | |
Graph Sampling | |
GraphBLAS | |
Graphics Processing Unit (GPU) | |
green computing | |
Green's Function | |
Green500 | |
Grid’5000 | |
H | |
Hardware Acceleration | |
Hardware Accelerator | |
Hardware overprovisioning | |
Hardware-Efficient Inference | |
Heterogeneous | |
Heterogeneous architecture | |
Heterogeneous computing | |
Heterogeneous Density Problem | |
High Performance Computing | |
High performance training | |
High-performance Computing | |
High-Performance Computing (HPC) | |
High-performance computing (HPC) systems | |
High-performance numerical computing | |
HPC | |
HPC | |
HPC applications | |
HPC Cluster | |
HPC Edge-To-Cloud | |
HPC workloads | |
Hybrid DMA-Cache | |
Hyper-parameter optimization | |
I | |
I/O malleability | |
imperfect verification | |
Importance Sampling | |
in situ | |
Independent Learning | |
Index structures | |
Inference | |
Inference Acceleration | |
Inference Optimization | |
Intermediate language | |
IoT Sensors | |
IR | |
iterative algorithm | |
J | |
Job Scheduling | |
K | |
Kernel pairing | |
Knowledge Graph (KG) | |
kubernetes | |
Kubernetes | |
KV cache | |
L | |
Large deep neural network training | |
Large Graph | |
Large Language Models | |
Large-scale graphs | |
latency detection | |
lazy evaluation | |
LBM | |
Livestock Monitoring | |
LLM | |
LLM inference | |
LLM serving | |
LLMs | |
LLVM | |
Load Balancing | |
load-balancing | |
log-structured file system | |
loop fusion | |
loop tiling | |
Low-Power GPU Rendering | |
M | |
Machine Learning | |
Machine Learning Workflows | |
Malleability | |
Medical applications | |
Memory Hierarchy | |
Memory Mapping | |
Memory Resource Provisioning | |
Memory Saving | |
Metaheuristics | |
Meteorological model | |
Mixture-of-Experts | |
MLIR | |
Model Compression | |
Model Partitioning | |
Modelling | |
molecular dynamics | |
Monitoring | |
MPI | |
MPI Collective I/O | |
MT-3000 | |
Multi-Agent Reinforcement Learning | |
Multi-DNN accelerators | |
Multi-DNN Inference Serving | |
Multi-GPU Training | |
multi-rail communication | |
multi-site workflows | |
Multi-threaded | |
multilinear algebra | |
N | |
Neural networks | |
Neural operators | |
Neuromorphic Computing | |
noise injection | |
nonblocking execution | |
Nonlinear constrained optimization | |
Novel Architectures | |
Nowcasti | |
numerical linear algebra | |
O | |
Observability | |
Offloading | |
On-chip memory | |
Online trainning | |
OpenCL benchmarks | |
OpenMP | |
Operational data analytics (ODA) | |
Opreation Fusion | |
Opreation Split | |
Optimal Transport | |
Optimizations | |
Osteosarcoma | |
Out-of-core processing | |
oversubscription | |
P | |
PageRank | |
Parallel | |
Parallel algorithms | |
Parallel Branch-and-Bound | |
Parallel Computing | |
Parallel Computing on GPUs | |
parallel computing performance | |
Parallel Graph Computations | |
Parallel Processing | |
Parallel Programming | |
Parallel Programming Automation | |
Parallel Programming Models | |
Parallel SGD | |
Parallel skeletons | |
parallel-in-time | |
Parallelism | |
Parsl | |
Particle Swarm Optimization | |
Partitioning | |
Peer-to-peer Networks | |
Performance | |
performance analysis | |
Performance Analysis Tools | |
Performance Evaluation | |
Performance optimaztion | |
Performance Prediction | |
Performance Tuning | |
Permissioned blockchain framework | |
Phase Analysis | |
PMIx | |
Polyhedral compilation | |
Portability | |
Portability | |
Power consumption | |
Privacy-Preserving Machine Learning | |
Processing-in-Memory | |
Program Analysis | |
Programming | |
Programming Languages | |
Programming models | |
Pruning | |
PTX | |
Pyramidal Analysis | |
Q | |
QoS | |
quality of service | |
Quantization | |
Quantum Algorithm | |
Quantum Data Storage | |
Quantum Image Processing | |
Quantum Signal Processing | |
R | |
Radio Astronomy | |
Random Walks | |
Real-Time Rendering Performance | |
Recommender system | |
Reduced precision | |
refactoring | |
Rejection Sampling | |
Remote Offloading | |
Reproducibility | |
Residual | |
resilience | |
Resource Adaptivity | |
Resource Allocation | |
Resource Management | |
Resource usage coordination | |
RISC-V | |
RMA | |
ROS2 | |
RTL simulation | |
Run-off | |
Runtime Analysis Tools | |
Runtime systems | |
Rust | |
S | |
SaaS | |
Sampled Simulation | |
Scalable | |
Scalable Vector Extension | |
scalar product | |
Scheduling | |
scheduling | |
Scheduling and resource management | |
Scientific Workflows | |
Scientific workflows | |
Sequencing read alignment | |
Sequential Least-Squares Quadratic Programming(SLSQP) | |
Serverless | |
Serverless Computing | |
service-level agreement | |
Shared Memory | |
silent error | |
Simulation | |
Simulation Point | |
Skipping Non-Zero (SkipNZ) | |
SLA | |
SLACK | |
Slurm | |
SMT Processors | |
Software architecture | |
software reengineering | |
Sparse Architectures | |
Sparse LU fatorization | |
Sparse Matrix Multiplication | |
Sparse Tensor Cores | |
sparse vectors | |
Spiking Neural Network | |
Staleness | |
Statistical Analysis | |
Stencil | |
stencil operations | |
Stream Processing | |
Streaming Graph Processing Systems (SGPSs) | |
Sub-batching and Sub-batch merging | |
Subtoken | |
Sunway architecture | |
Sustainable AI | |
sustainable computing | |
Swirl | |
SYCL | |
system throughput | |
Systems | |
Systems for Machine Learning | |
T | |
Task graph | |
Task graph computing system | |
task graph parallelism | |
Task-Based Programming | |
Task-Operator Co-Scheduling | |
Task-parallel linear algebra computations | |
Tasking | |
TBA1 | |
TBA2 | |
TBA3 | |
tensor contraction ordering | |
tensor decomposition | |
tensor-train decomposition | |
Termination | |
testbed | |
Thread-to-Core Allocation Policies | |
Tianhe new-generation supercomputer | |
Top500 | |
Trace-driven Simulation | |
Tracing and monitoring | |
transfer learning | |
Transform code | |
Transformer | |
Triangle Counting | |
Trusted Execution Environment | |
Tsunami Forecasting | |
V | |
Vector Processor | |
Vector Unit | |
Vertical scaling | |
virtual machines | |
virtual topologies | |
Virtualization | |
Vision Transformer | |
W | |
Weather Radar | |
Wedge-Parallel Approaches | |
Workflows | |
workload prediction | |
Write optimization |