TALK KEYWORD INDEX
This page contains an index consisting of author-provided keywords.
| A | |
| Accelerator | |
| Accelerators | |
| Ad-Hoc file system | |
| adaptive-precision | |
| Address translation | |
| Adversarial Attack | |
| AGCM | |
| AI Accelerators | |
| AIOps | |
| Alternative Basis Method | |
| Application workflows | |
| approximate spanning tree | |
| Approximation | |
| Approximation algorithm | |
| Approximation algorithms | |
| ARMv8-A (NEON) | |
| Assembly generation | |
| Asynchronous Federated Learning | |
| Attention | |
| Attention Importance | |
| Auto-tuning | |
| Automated tool generation | |
| Automatic Dimension Reduction | |
| B | |
| Backdoor watermark | |
| backtracking | |
| Batch scheduling resource allocation | |
| Benchmarking | |
| Bilinear Algorithms | |
| Bit Flipping Key Encapsulation | |
| Blockchains | |
| Boolean matrix multiplication | |
| Breadth-First Search | |
| Byzantine robustness | |
| C | |
| C++ Coroutine | |
| Cache | |
| Cache Efficiency | |
| Cache Side-channel Attack | |
| CGRA Mapping | |
| Chained memory access | |
| chapel | |
| Checkpoint | |
| Checkpointing | |
| Classifier Retraining | |
| Cloud applications | |
| Cloud bursting | |
| Cloud Computing | |
| Cloud migration | |
| Coarse-grained Reconfigurable Array | |
| Code Generation | |
| Coded Distributed Computation | |
| collective i/o | |
| Columnar data format | |
| Combinatorial optimization | |
| communication-computation overlap | |
| Compiler | |
| Computation-in-Memory | |
| Computer arithmetic | |
| Computer Engineering | |
| Concurrency | |
| concurrent data structures | |
| congested clique | |
| Congestion control | |
| Connect components | |
| Constraint Programming | |
| Convolutional neural networks | |
| Cost-effectiveness | |
| CPU allocation techniques | |
| Cross-Search | |
| Cross-shard transaction | |
| Cuckoo hashing | |
| CUDA | |
| D | |
| Data centers | |
| Data compression | |
| Data movement | |
| Data movement strategies | |
| Data Stream Processing | |
| Data-preprocessing | |
| Deep learning | |
| Deep neural network | |
| DeepFake detection | |
| Dense matrix-matrix multiplication | |
| Design-space explorations | |
| Developing and deploying HPC and AI/ML applications | |
| Differentiated Services | |
| dimension reduction | |
| Distributed aggregation | |
| Distributed Computing | |
| Distributed machine Learning | |
| Distributed Systems | |
| Distributed Training | |
| DMTCP | |
| DNN accelerator | |
| Domain-specific Language | |
| Dominators | |
| DPU | |
| Dynamic Frontier approach | |
| Dynamic networks | |
| E | |
| Earliest Deadline First scheduling scheme | |
| EDF | |
| Edge Computing | |
| Edge technologies | |
| EEG | |
| Energy | |
| Energy minimization | |
| ensemble simulation | |
| Epilepsy | |
| European HPC | |
| Expand Ad-Hoc | |
| Explicit Sharing | |
| Extreme-scale Scientific Software Stack | |
| F | |
| FaaS | |
| Fast Long Integer Multiplication | |
| Fault tolerance | |
| Federated Learning | |
| FedGNNs | |
| FFT | |
| Fine-grained/hierarchical locking | |
| floating-point arithmetic | |
| Fog Computing | |
| Folded Mapping Strategy | |
| FPGA | |
| Fully Homomorphic Encryption | |
| Function-as-a-Service | |
| G | |
| gem5 simulations | |
| Generate adversarial networks | |
| genome analysis | |
| GNN inference | |
| GPU | |
| GPU architecture | |
| gpu computing | |
| GPU Programming | |
| Gradient Compression | |
| Graph algorithms | |
| Graph Learning | |
| Graph Neural Network | |
| Graph Neural Networks | |
| Graph partition | |
| Grid computing resource management | |
| H | |
| Hamming space | |
| Hash function | |
| Heterogeneity | |
| Heterogeneous Graph Neural Network | |
| heterogeneous platforms | |
| HGNN Accelerator | |
| Hierarchical data structures | |
| Hierarchical sharding | |
| High Energy Physics | |
| High performance | |
| high performance computing | |
| High-Level Synthesis | |
| High-Performance Computing | |
| high-productivity | |
| HIP | |
| HLS | |
| HoL Blocking | |
| HPC | |
| hpc io | |
| HPC-AI workflow | |
| HTTP Adaptive Streaming | |
| Huge page | |
| Hybrid Clouds | |
| Hybrid Parallelism | |
| I | |
| I/O Complexity | |
| I/O forwarding | |
| Iceberg hashing | |
| Image Processing | |
| Implicit Sharing | |
| Importance | |
| Improved Multi-Dimensional Dichotomy | |
| Industrial Control | |
| Inference | |
| Injection throttling | |
| Instruction-Set Architecture | |
| Intel Data Center GPU | |
| inter-FPGA communication | |
| Interconnection networks | |
| Internet of Vehicles | |
| Intervals | |
| IoT applications | |
| Iterative solver | |
| J | |
| Joint Optimization | |
| K | |
| Key-Value store | |
| Key-Value Stores | |
| Kubernetes | |
| L | |
| Large DNN | |
| Large Language Models | |
| Linear algebra | |
| LLM | |
| Load Balance | |
| Local Update | |
| Locality-awareness | |
| lock-free | |
| Log Analysis | |
| Long-tailed and Non-IID Data | |
| loop pipelining | |
| low-rank approximations | |
| LU factorization | |
| M | |
| Machine Learning | |
| Manage and launch multi-node multi-user clusters | |
| Matrix multiplication | |
| Matrix-vector multiplication | |
| Maximum weighted clique | |
| Medical Image Splitting | |
| Memory optimization | |
| memory systems | |
| Micro-batch-based Data Parallelism | |
| minimum spanning tree | |
| Mixed precision | |
| mixed precision algorithms | |
| mixed-precision | |
| ML Ensembles | |
| MLIR | |
| Model watermarking | |
| MPI | |
| MPI-IO | |
| Multi-Get | |
| Multi-GPU | |
| Multi-Objective Optimization | |
| Multicore processors | |
| Multigrid | |
| Multithreading | |
| N | |
| near-data processing | |
| Network Digital Twin | |
| Neural Architecture Search | |
| Neural Network | |
| number of rounds | |
| O | |
| object storage targets | |
| Octree | |
| Oil & Gas Exploration | |
| One-Sided Communication | |
| online learning | |
| OpenMP | |
| Operation Fusion | |
| Optimistic Synchronisation | |
| Optimization | |
| Ownership Verification | |
| P | |
| PageRank algorithm | |
| Parallel | |
| parallel algorithm | |
| Parallel algorithms | |
| Parallel computing | |
| Parallel Discrete Event Simulation | |
| Parallel file system | |
| parallel I/O | |
| Parallel Programming | |
| Parallel Region Classification | |
| Parallel writing | |
| Performance and energy efficiency | |
| Performance Counters | |
| Performance optimization | |
| Performance projection | |
| Performance tools | |
| Persistent Memory | |
| Pipeline Parallelism | |
| pipelining scheme | |
| Platform development | |
| Poisoning Attacks | |
| Polynomial Multiplication | |
| Post-Quantum Cryptography | |
| Power capping | |
| Power-Law Graph | |
| precedence constraint | |
| Prefetching | |
| Privacy Preservation | |
| Privacy Protection | |
| Processing in Memory | |
| Processor micro-architectures | |
| Programming Abstractions | |
| Pruning | |
| PyCOMPSs | |
| Q | |
| Quantization | |
| Quotienting | |
| R | |
| randomized algorithms | |
| RDMA | |
| Redis | |
| Reference counting | |
| relaxed semantics | |
| Reliability Engineering | |
| Resistive random access memory | |
| resource allocation | |
| Resource Management | |
| response time analysis | |
| Restricted Assignment | |
| Reverse Time Migration | |
| RF circuit simulation | |
| RISC-V (RVV) | |
| RMT Pipeline | |
| Roofline model | |
| ROOT | |
| Root Cause Analysis | |
| ROS2 Multi-threaded Executor | |
| ROSS | |
| RSIC-V | |
| RTL simulation | |
| S | |
| satellite constellation | |
| satellite downloading | |
| satellite network | |
| Scheduling | |
| Scheduling with rejection | |
| Scientific workflows | |
| Self-adaptive | |
| sequence alignment | |
| Serverful | |
| Serverless | |
| Service placement | |
| Service replication | |
| Shared-memory systems | |
| SIMD | |
| SIMD/Vector instructions | |
| Similarity | |
| Simulation | |
| Single Shared File | |
| Software Coupling | |
| Sparse Matrix Operations | |
| Sparse matrix reordering | |
| sparse matrix-vector product (SpMV) | |
| SPH | |
| SpMM | |
| SpMV | |
| Staleness | |
| Stencil Computation | |
| Storage System | |
| Stream Processing | |
| Synchronization | |
| Systems performance | |
| Systolic array | |
| T | |
| Task graph parallelism | |
| Tensor Core | |
| Tensor Cores | |
| Time Aware Shaper | |
| Time-Sensitive Networking | |
| Tools interface | |
| Toom-Cook | |
| Toom-Graph | |
| Traffic Scheduling | |
| Transformers | |
| Translation lookaside buffer | |
| V | |
| Vertex Pruning | |
| Video Encoding | |
| Vision Transformer | |
| VLA | |
| W | |
| Wait-free | |
| Work stealing | |
| Workload Characterization | |
| Workload Prediction | |
| workload-aware dynamic scheduler | |
| Wrapper based tools | |