IXPUG ANNUAL CONFERENCE 2021: IXPUG ANNUAL CONFERENCE 2021
PROGRAM FOR WEDNESDAY, OCTOBER 20TH
Days:
previous day
next day
all days

View: session overviewtalk overview

10:00-12:15 Session 4: Day2
10:00
Welcome - Day 2
10:15
The role of simulations in understanding compressible turbulence: what we know, what we want to know, and how extreme simulations can help
11:15
HiCMA++: A Modern Low-Rank Matrix Algebra C++ Framework

ABSTRACT. This talk highlights how to leverage the C++ object-oriented programming paradigm through providing a modern, high-level C++ framework to address the challenges in developing the HiCMA++ numerical library. HiCMA++ exploits the data sparsity structure of dense operators by compressing off-diagonal tiles up to an application’s accuracy threshold. Low-rank approximations constitute a major trend in the linear algebra community as they enable not only to tackle the curse of dimensionality for big data applications but also to solve a broad class of traditional HPC large-scale scientific applications. Once compressed, the matrix is represented by a collection of logical tiles that may be stored in dense or low-rank data layout format. Using function overloading and templating, HiCMA++ enables to support various low-rank matrix operations with dense and/or low-rank operands, multiple precision generation, compression algorithm variants, and code portability via hardware abstraction — all achieved with high productivity in mind.

11:45
OneAPI to Rule Them All: A Testcase with Tile Low-Rank Matrix-Vector Multiplication for Scientific Applications

ABSTRACT. The talk presents an HPC implementation of Tile Low-Rank Matrix-Vector Multiplication (TLR-MVM) based on OneAPI. TLR-MVM is one of the most time-consuming computational kernels for seismic wave-equation-based processing and ground-based computational astronomy applications. TLR-MVM exploits data sparsity of the respective operators and relies on an efficient data layout to saturate memory bandwidth of the underlying hardware architectures. We investigate OpenMP and OneAPI programming models in our TLR-MVM implementation. We further mitigate the overheads of load imbalance and maximize occupancy by properly mapping the threads to the hardware memory and core settings. We report preliminary results and show the performance superiority of TLR-MVM against state-of-the-art dense OneMKL implementations.

12:45-14:05 Session 5: Site Updates
12:45
Site Update - Univ of Tokyo
13:00
Site Update - ZIB
13:15
Site Update - TACC
13:30
Site Update - ANL
14:00-16:00 Session 6: Tutorial
14:00
Workflow and Profiling for Heterogenous compute

ABSTRACT. Acceleration of Performance using direct programming of GPUs. In this session, we will use provide an introduction to the Intel OneAPI profiling tools, Intel® VTune™ Profiler and Intel® Advisor, to identify performance bottlenecks. This session will provide a tutorial and give insights into how to get the most performance of your Intel GPU and CPU.