PROGRAM FOR WEDNESDAY, OCTOBER 20TH
Days:
previous day
next day
all days
View: session overviewtalk overview
10:00-12:15 Session 4: Day2
10:00 | Welcome - Day 2 |
10:15 | The role of simulations in understanding compressible turbulence: what we know, what we want to know, and how extreme simulations can help |
11:15 | HiCMA++: A Modern Low-Rank Matrix Algebra C++ Framework ABSTRACT. This talk highlights how to leverage the C++ object-oriented programming paradigm through providing a modern, high-level C++ framework to address the challenges in developing the HiCMA++ numerical library. HiCMA++ exploits the data sparsity structure of dense operators by compressing off-diagonal tiles up to an application’s accuracy threshold. Low-rank approximations constitute a major trend in the linear algebra community as they enable not only to tackle the curse of dimensionality for big data applications but also to solve a broad class of traditional HPC large-scale scientific applications. Once compressed, the matrix is represented by a collection of logical tiles that may be stored in dense or low-rank data layout format. Using function overloading and templating, HiCMA++ enables to support various low-rank matrix operations with dense and/or low-rank operands, multiple precision generation, compression algorithm variants, and code portability via hardware abstraction — all achieved with high productivity
in mind. |
11:45 | OneAPI to Rule Them All: A Testcase with Tile Low-Rank Matrix-Vector Multiplication for Scientific Applications ABSTRACT. The talk presents an HPC implementation of Tile Low-Rank Matrix-Vector Multiplication (TLR-MVM) based on OneAPI. TLR-MVM is one of the most time-consuming computational kernels for seismic wave-equation-based processing and ground-based computational astronomy applications. TLR-MVM exploits data sparsity of the respective operators and relies on an efficient data layout to saturate memory bandwidth of the underlying hardware architectures. We investigate OpenMP and OneAPI programming models in our TLR-MVM implementation. We further mitigate the overheads of load imbalance and maximize occupancy by properly mapping the threads to the hardware memory and core settings. We report preliminary results and show the performance superiority of TLR-MVM against state-of-the-art dense OneMKL implementations. |
14:00-16:00 Session 6: Tutorial
14:00 | Workflow and Profiling for Heterogenous compute ABSTRACT. Acceleration of Performance using direct programming of GPUs. In this session, we will use provide an introduction to the Intel OneAPI profiling tools, Intel® VTune™ Profiler and Intel® Advisor, to identify performance bottlenecks. This session will provide a tutorial and give insights into how to get the most performance of your Intel GPU and CPU. |