Program for Wednesday, October 20th

PROGRAM FOR WEDNESDAY, OCTOBER 20TH

Days:

previous day

next day

all days

View: session overview talk overview

10:00-12:15 Session 4: Day2

Chair:

Amit Ruhela

10:00	Amit Ruhela Welcome - Day 2
10:15	Diego Donzis The role of simulations in understanding compressible turbulence: what we know, what we want to know, and how extreme simulations can help
11:15	Mohammed Al Farhan, Huda Ibeid, Hatem Ltaief and David Keyes HiCMA++: A Modern Low-Rank Matrix Algebra C++ Framework ABSTRACT. This talk highlights how to leverage the C++ object-oriented programming paradigm through providing a modern, high-level C++ framework to address the challenges in developing the HiCMA++ numerical library. HiCMA++ exploits the data sparsity structure of dense operators by compressing off-diagonal tiles up to an application’s accuracy threshold. Low-rank approximations constitute a major trend in the linear algebra community as they enable not only to tackle the curse of dimensionality for big data applications but also to solve a broad class of traditional HPC large-scale scientific applications. Once compressed, the matrix is represented by a collection of logical tiles that may be stored in dense or low-rank data layout format. Using function overloading and templating, HiCMA++ enables to support various low-rank matrix operations with dense and/or low-rank operands, multiple precision generation, compression algorithm variants, and code portability via hardware abstraction — all achieved with high productivity in mind.
11:45	Yuxi Hong, Mohammed A. Al Farhan, Jesse Cranney, Damien Gratadour, Huda Ibeid, Hatem Ltaief, Matteo Ravasi, Philippe Thierry and David Keyes OneAPI to Rule Them All: A Testcase with Tile Low-Rank Matrix-Vector Multiplication for Scientific Applications ABSTRACT. The talk presents an HPC implementation of Tile Low-Rank Matrix-Vector Multiplication (TLR-MVM) based on OneAPI. TLR-MVM is one of the most time-consuming computational kernels for seismic wave-equation-based processing and ground-based computational astronomy applications. TLR-MVM exploits data sparsity of the respective operators and relies on an efficient data layout to saturate memory bandwidth of the underlying hardware architectures. We investigate OpenMP and OneAPI programming models in our TLR-MVM implementation. We further mitigate the overheads of load imbalance and maximize occupancy by properly mapping the threads to the hardware memory and core settings. We report preliminary results and show the performance superiority of TLR-MVM against state-of-the-art dense OneMKL implementations.

12:45-14:05 Session 5: Site Updates

Chair:

Nalini Kumar

12:45	Toshihiro Hanawa Site Update - Univ of Tokyo
13:00	Thomas Steinke Site Update - ZIB
13:15	John Cazes Site Update - TACC
13:30	David Martin Site Update - ANL

14:00-16:00 Session 6: Tutorial

Chair:

Clay Hughes

14:00

Kevin Oleary

Workflow and Profiling for Heterogenous compute

ABSTRACT. Acceleration of Performance using direct programming of GPUs. In this session, we will use provide an introduction to the Intel OneAPI profiling tools, Intel® VTune™ Profiler and Intel® Advisor, to identify performance bottlenecks. This session will provide a tutorial and give insights into how to get the most performance of your Intel GPU and CPU.