TALK KEYWORD INDEX
This page contains an index consisting of author-provided keywords.
A | |
Accelerator | |
Accelerators | |
Ad-Hoc file system | |
adaptive-precision | |
Address translation | |
Adversarial Attack | |
AGCM | |
AI Accelerators | |
AIOps | |
Alternative Basis Method | |
Application workflows | |
approximate spanning tree | |
Approximation | |
Approximation algorithm | |
Approximation algorithms | |
ARMv8-A (NEON) | |
Assembly generation | |
Asynchronous Federated Learning | |
Attention | |
Attention Importance | |
Auto-tuning | |
Automated tool generation | |
Automatic Dimension Reduction | |
B | |
Backdoor watermark | |
backtracking | |
Batch scheduling resource allocation | |
Benchmarking | |
Bilinear Algorithms | |
Bit Flipping Key Encapsulation | |
Blockchains | |
Boolean matrix multiplication | |
Breadth-First Search | |
Byzantine robustness | |
C | |
C++ Coroutine | |
Cache | |
Cache Efficiency | |
Cache Side-channel Attack | |
CGRA Mapping | |
Chained memory access | |
chapel | |
Checkpoint | |
Checkpointing | |
Classifier Retraining | |
Cloud applications | |
Cloud bursting | |
Cloud Computing | |
Cloud migration | |
Coarse-grained Reconfigurable Array | |
Code Generation | |
Coded Distributed Computation | |
collective i/o | |
Columnar data format | |
Combinatorial optimization | |
communication-computation overlap | |
Compiler | |
Computation-in-Memory | |
Computer arithmetic | |
Computer Engineering | |
Concurrency | |
concurrent data structures | |
congested clique | |
Congestion control | |
Connect components | |
Constraint Programming | |
Convolutional neural networks | |
Cost-effectiveness | |
CPU allocation techniques | |
Cross-Search | |
Cross-shard transaction | |
Cuckoo hashing | |
CUDA | |
D | |
Data centers | |
Data compression | |
Data movement | |
Data movement strategies | |
Data Stream Processing | |
Data-preprocessing | |
Deep learning | |
Deep neural network | |
DeepFake detection | |
Dense matrix-matrix multiplication | |
Design-space explorations | |
Developing and deploying HPC and AI/ML applications | |
Differentiated Services | |
dimension reduction | |
Distributed aggregation | |
Distributed Computing | |
Distributed machine Learning | |
Distributed Systems | |
Distributed Training | |
DMTCP | |
DNN accelerator | |
Domain-specific Language | |
Dominators | |
DPU | |
Dynamic Frontier approach | |
Dynamic networks | |
E | |
Earliest Deadline First scheduling scheme | |
EDF | |
Edge Computing | |
Edge technologies | |
EEG | |
Energy | |
Energy minimization | |
ensemble simulation | |
Epilepsy | |
European HPC | |
Expand Ad-Hoc | |
Explicit Sharing | |
Extreme-scale Scientific Software Stack | |
F | |
FaaS | |
Fast Long Integer Multiplication | |
Fault tolerance | |
Federated Learning | |
FedGNNs | |
FFT | |
Fine-grained/hierarchical locking | |
floating-point arithmetic | |
Fog Computing | |
Folded Mapping Strategy | |
FPGA | |
Fully Homomorphic Encryption | |
Function-as-a-Service | |
G | |
gem5 simulations | |
Generate adversarial networks | |
genome analysis | |
GNN inference | |
GPU | |
GPU architecture | |
gpu computing | |
GPU Programming | |
Gradient Compression | |
Graph algorithms | |
Graph Learning | |
Graph Neural Network | |
Graph Neural Networks | |
Graph partition | |
Grid computing resource management | |
H | |
Hamming space | |
Hash function | |
Heterogeneity | |
Heterogeneous Graph Neural Network | |
heterogeneous platforms | |
HGNN Accelerator | |
Hierarchical data structures | |
Hierarchical sharding | |
High Energy Physics | |
High performance | |
high performance computing | |
High-Level Synthesis | |
High-Performance Computing | |
high-productivity | |
HIP | |
HLS | |
HoL Blocking | |
HPC | |
hpc io | |
HPC-AI workflow | |
HTTP Adaptive Streaming | |
Huge page | |
Hybrid Clouds | |
Hybrid Parallelism | |
I | |
I/O Complexity | |
I/O forwarding | |
Iceberg hashing | |
Image Processing | |
Implicit Sharing | |
Importance | |
Improved Multi-Dimensional Dichotomy | |
Industrial Control | |
Inference | |
Injection throttling | |
Instruction-Set Architecture | |
Intel Data Center GPU | |
inter-FPGA communication | |
Interconnection networks | |
Internet of Vehicles | |
Intervals | |
IoT applications | |
Iterative solver | |
J | |
Joint Optimization | |
K | |
Key-Value store | |
Key-Value Stores | |
Kubernetes | |
L | |
Large DNN | |
Large Language Models | |
Linear algebra | |
LLM | |
Load Balance | |
Local Update | |
Locality-awareness | |
lock-free | |
Log Analysis | |
Long-tailed and Non-IID Data | |
loop pipelining | |
low-rank approximations | |
LU factorization | |
M | |
Machine Learning | |
Manage and launch multi-node multi-user clusters | |
Matrix multiplication | |
Matrix-vector multiplication | |
Maximum weighted clique | |
Medical Image Splitting | |
Memory optimization | |
memory systems | |
Micro-batch-based Data Parallelism | |
minimum spanning tree | |
Mixed precision | |
mixed precision algorithms | |
mixed-precision | |
ML Ensembles | |
MLIR | |
Model watermarking | |
MPI | |
MPI-IO | |
Multi-Get | |
Multi-GPU | |
Multi-Objective Optimization | |
Multicore processors | |
Multigrid | |
Multithreading | |
N | |
near-data processing | |
Network Digital Twin | |
Neural Architecture Search | |
Neural Network | |
number of rounds | |
O | |
object storage targets | |
Octree | |
Oil & Gas Exploration | |
One-Sided Communication | |
online learning | |
OpenMP | |
Operation Fusion | |
Optimistic Synchronisation | |
Optimization | |
Ownership Verification | |
P | |
PageRank algorithm | |
Parallel | |
parallel algorithm | |
Parallel algorithms | |
Parallel computing | |
Parallel Discrete Event Simulation | |
Parallel file system | |
parallel I/O | |
Parallel Programming | |
Parallel Region Classification | |
Parallel writing | |
Performance and energy efficiency | |
Performance Counters | |
Performance optimization | |
Performance projection | |
Performance tools | |
Persistent Memory | |
Pipeline Parallelism | |
pipelining scheme | |
Platform development | |
Poisoning Attacks | |
Polynomial Multiplication | |
Post-Quantum Cryptography | |
Power capping | |
Power-Law Graph | |
precedence constraint | |
Prefetching | |
Privacy Preservation | |
Privacy Protection | |
Processing in Memory | |
Processor micro-architectures | |
Programming Abstractions | |
Pruning | |
PyCOMPSs | |
Q | |
Quantization | |
Quotienting | |
R | |
randomized algorithms | |
RDMA | |
Redis | |
Reference counting | |
relaxed semantics | |
Reliability Engineering | |
Resistive random access memory | |
resource allocation | |
Resource Management | |
response time analysis | |
Restricted Assignment | |
Reverse Time Migration | |
RF circuit simulation | |
RISC-V (RVV) | |
RMT Pipeline | |
Roofline model | |
ROOT | |
Root Cause Analysis | |
ROS2 Multi-threaded Executor | |
ROSS | |
RSIC-V | |
RTL simulation | |
S | |
satellite constellation | |
satellite downloading | |
satellite network | |
Scheduling | |
Scheduling with rejection | |
Scientific workflows | |
Self-adaptive | |
sequence alignment | |
Serverful | |
Serverless | |
Service placement | |
Service replication | |
Shared-memory systems | |
SIMD | |
SIMD/Vector instructions | |
Similarity | |
Simulation | |
Single Shared File | |
Software Coupling | |
Sparse Matrix Operations | |
Sparse matrix reordering | |
sparse matrix-vector product (SpMV) | |
SPH | |
SpMM | |
SpMV | |
Staleness | |
Stencil Computation | |
Storage System | |
Stream Processing | |
Synchronization | |
Systems performance | |
Systolic array | |
T | |
Task graph parallelism | |
Tensor Core | |
Tensor Cores | |
Time Aware Shaper | |
Time-Sensitive Networking | |
Tools interface | |
Toom-Cook | |
Toom-Graph | |
Traffic Scheduling | |
Transformers | |
Translation lookaside buffer | |
V | |
Vertex Pruning | |
Video Encoding | |
Vision Transformer | |
VLA | |
W | |
Wait-free | |
Work stealing | |
Workload Characterization | |
Workload Prediction | |
workload-aware dynamic scheduler | |
Wrapper based tools |