TALK KEYWORD INDEX
This page contains an index consisting of author-provided keywords.
A | |
Activation Sparsity | |
Adaptive scheduling | |
agentic-ai | |
AI | |
AI Accelerators | |
AIPC | |
Algorithmic Skeletons | |
AlphaTensor | |
Analog in-memory computing | |
Approximation ratio | |
ARM SVE | |
Articulation points | |
Asynchronous Data Processing | |
auction | |
Auto-scaling | |
Autonomy Loops | |
Autoscaling | |
B | |
Batch processing | |
Benchmark Suite | |
Biconnected components | |
Big Data | |
Bioinformatics | |
bottleneck detection | |
Branch prediction | |
Byzantine failures | |
C | |
Cache management | |
Caching / Paging | |
Caching update | |
carbon cost | |
CAS | |
Checkpointing | |
cloud | |
Cloud Computing | |
cloud-to-thing | |
Clustering | |
Co-Design Framework | |
co-running | |
Code generation | |
Collective Algorithms | |
Collective communication | |
Competitive analysis | |
compilation | |
Compiler Optimization | |
Compressed Communication | |
Computational Efficiency | |
computational storage device | |
compute | |
Compute Express Link (CXL) | |
Computer architecture | |
Concurrent kernel execution | |
Consensus | |
continuum | |
Convolutional Neural Networks (CNN) | |
CPU utilization | |
CUDA | |
Cut vertices | |
D | |
Data Analysis of Scientific Computing | |
Data Augmentation | |
Data Compression | |
Data structure | |
datacenter | |
Dataflow Optimization | |
Decentralized Federated Learning | |
Decentralized Systems | |
Decoupled AllReduce | |
deep domain adaptation | |
Deep Learning | |
Deep Learning Serving Systems | |
Deep Neural Network | |
Device Heterogeneity | |
Diffusion Model | |
Diffusion Model Accelerator | |
Diffusion Transformers | |
Directed Acyclic Task Graph (DATG) scheduling QR factorization | |
Disaggregated memory | |
Distributed Computing | |
Distributed Systems | |
Distributed training | |
distributed-computing | |
DNN Training | |
Docker containers | |
DPDK | |
DSL | |
Dual-Cache | |
Dynamic caching method | |
dynamic programming | |
Dynamic Resource Allocation | |
Dynamic Resource Management | |
E | |
Edge Accelerators | |
Edge AI | |
Edge computing | |
Edge Network | |
edge platform | |
Education | |
Efficiency | |
Efficient Inference on Local platforms | |
Elastic HPC | |
Elixir | |
Embedding Table | |
Emerging Memory System | |
energy-aware algorithms | |
Ethernet | |
F | |
FaaS | |
Fault Tolerance | |
Federated Learning | |
FIM | |
First-principles materials simulation | |
Floating-Point Non-Associatvity | |
fog | |
FPGA Accelerator | |
function-as-a-service | |
Functional array languages | |
G | |
garbage collection | |
Generate code | |
Gigapixel Images | |
GPU | |
GPU Acceleration | |
GPU allocation | |
GPU cache management | |
GPU code generation | |
GPU scheduling | |
GPUs | |
Grace Hopper | |
Gradient compression | |
Graph Neural Network (GNN) | |
Graph Neural Networks | |
graph partitioning | |
Graph Processing | |
Green's Function | |
Grid’5000 | |
H | |
Hardware Acceleration | |
Hardware overprovisioning | |
Hardware-Efficient Inference | |
Heterogeneous Architecture | |
Heterogeneous computing | |
Heterogeneous Density Problem | |
High-performance Computing | |
High-Performance Computing (HPC) | |
High-performance numerical computing | |
HPC | |
HPC Cluster | |
HPC workloads | |
I | |
imperfect verification | |
Importance Sampling | |
in situ | |
Independent Learning | |
Index structures | |
Inference | |
Inference Acceleration | |
Inference Optimization | |
IoT Sensors | |
iterative algorithm | |
J | |
Job Scheduling | |
K | |
Kernel pairing | |
kubernetes | |
KV cache | |
L | |
Large Graph | |
Large Language Models | |
Large-scale graphs | |
latency detection | |
LBM | |
Livestock Monitoring | |
LLM | |
LLM Inference | |
LLM serving | |
LLMs | |
LLVM | |
Load Balancing | |
load-balancing | |
log-structured file system | |
M | |
Machine Learning | |
Machine Learning Workflows | |
Memory Mapping | |
Memory Resource Provisioning | |
Memory Saving | |
Mixture-of-Experts | |
MLIR | |
Model Compression | |
Model Partitioning | |
MPI | |
MPI Collective I/O | |
Multi-Agent Reinforcement Learning | |
Multi-DNN accelerators | |
Multi-DNN Inference Serving | |
Multi-GPU Training | |
Multi-threaded | |
multilinear algebra | |
N | |
Neural networks | |
Neural operators | |
Neuromorphic Computing | |
noise injection | |
Nonlinear constrained optimization | |
Novel Architectures | |
numerical linear algebra | |
O | |
Observability | |
On-chip memory | |
Online trainning | |
OpenMP | |
Opreation Fusion | |
Opreation Split | |
Optimal Transport | |
Optimizations | |
Out-of-core processing | |
P | |
Parallel | |
Parallel Algorithms | |
Parallel Computing | |
Parallel Computing on GPUs | |
Parallel Graph Computations | |
Parallel Processing | |
Parallel Programming | |
Parallel SGD | |
Parsl | |
Partitioning | |
Peer-to-peer Networks | |
performance analysis | |
Performance optimaztion | |
Phase Analysis | |
PMIx | |
Program Analysis | |
Programming | |
Programming models | |
Pruning | |
Pyramidal Analysis | |
Q | |
QoS | |
quality of service | |
Quantization | |
Quantum Algorithm | |
Quantum Data Storage | |
Quantum Image Processing | |
Quantum Signal Processing | |
R | |
Radio Astronomy | |
Random Walks | |
Recommender system | |
Rejection Sampling | |
Remote Offloading | |
Reproducibility | |
Residual | |
resilience | |
Resource Adaptivity | |
Resource Allocation | |
Resource Management | |
RISC-V | |
RTL simulation | |
S | |
Sampled Simulation | |
Scalable Vector Extension | |
scalar product | |
Scheduling | |
Scheduling and resource management | |
Scientific Workflows | |
Sequencing read alignment | |
Sequential Least-Squares Quadratic Programming(SLSQP) | |
Serverless | |
Serverless Computing | |
service-level agreement | |
Shared Memory | |
silent error | |
Simulation | |
Simulation Point | |
Skipping Non-Zero (SkipNZ) | |
SLA | |
SLACK | |
Slurm | |
SMT Processors | |
Sparse Architectures | |
Sparse LU fatorization | |
Sparse Matrix Multiplication | |
Sparse Tensor Cores | |
Spiking Neural Network | |
Staleness | |
Stream Processing | |
Sub-batching and Sub-batch merging | |
Subtoken | |
Sunway architecture | |
Sustainable AI | |
system throughput | |
Systems for Machine Learning | |
T | |
task graph parallelism | |
Task-Operator Co-Scheduling | |
Task-parallel linear algebra computations | |
tensor contraction ordering | |
tensor decomposition | |
tensor-train decomposition | |
testbed | |
Thread-to-Core Allocation Policies | |
Trace-driven Simulation | |
Tracing and monitoring | |
transfer learning | |
Transform code | |
Triangle Counting | |
Trusted Execution Environment | |
V | |
Vector Unit | |
Vertical scaling | |
Virtualization | |
Vision Transformer | |
W | |
Wedge-Parallel Approaches | |
workload prediction | |
Write optimization |