PEARC'22: PRACTICE AND EXPERIENCE IN ADVANCED RESEARCH COMPUTING 22
PROGRAM FOR MONDAY, JULY 11TH
Days:
previous day
next day
all days

View: session overviewtalk overview

08:30-17:00 Session 3A: Full Day Tutorial in Arlington
08:30
Managing HPC Software Complexity with Spack
PRESENTER: Todd Gamblin

ABSTRACT. The modern scientific software stack includes thousands of packages, from C, C++, and Fortran libraries, to packages written in interpreted languages like Python and R. HPC applications may depend on hundreds of packages spanning all of these ecosytems. To achieve high performance, they must also leverage low- level and difficult-to-build libraries such as MPI, BLAS, and LAPACK. Integrating this stack is extremely challenging. The complexity can be an obstacle to deployment at HPC sites and deters developers from building on each others’ work. Spack is an open source tool for HPC package management that simplifies building, installing, customizing, and sharing HPC software stacks. In the past few years, its adoption has grown rapidly: by end-users, by HPC developers, and by the world’s largest HPC centers. Spack provides a powerful and flexible dependency model, a simple Python syntax for writing package build recipes, and a repository of over 6,000 community-maintained packages. This tutorial provides a thorough introduction to Spack’s capabilities: installing and authoring packages, integrating Spack with development workflows, and using Spack for deployment at HPC facilities. Attendees will leave with foundational skills for using Spack to automate day-to-day tasks, along with deeper knowledge for applying Spack to advanced use cases.

08:30-17:00 Session 3B: Two 1/2 Day Workshops in Beacon Hill
08:30
Navigate Life Sciences with Innovations from Dell Technologies and AMD
PRESENTER: Elliot Berger

ABSTRACT. Life Sciences offers exciting potentials for research using HPC infrastructures to accelerate simulation, optimization, and machine learning. HPC has been evolving and finding more uses in many new areas. Contributing to its expansion, are new and more complex use cases coupled with advanced infrastructures, processors and new types of GPU’s. Dell Technologies and AMD are on a journey to provide extremely efficient, powerful HPC infrastructures to reduce time to results for research. In this workshop, you will see the latest solutions for popular life science applications, how they do it and the state of the art architectures, processors and GPU’s that make them happen. We will describe how GPU architectures enable high performance and show how to migrate complex parallel code to GPUs. Topics will cover best processes for compilation, debugging and analyzing running code to get the best performance out of applications like LAMMPS, OpenMM and Relion. We will also show the usage of ML frameworks like PyTorch and TensorFlow and how they can accelerate the pre and post processing of scientific studies.

08:30-12:00 Session 3C: 1/2 Day Tutorial in Clarendon
08:30
Developing Robust and Scalable Next Generation Workflows Applications and Systems
PRESENTER: Kyle Chard

ABSTRACT. Workflow applications are critical to scientific discovery. Technology trends and the convergence of traditional High Performance Computing (HPC) with new simulation, analysis, and machine learning (ML) approaches provides unprecedented opportunities. Traditional approaches to workflow applications and systems development have scalability and robustness limits. The ExaWorks project is building a robust workflows SDK with robust and high-performance technologies as well as well-defined and scalable component interfaces which can be leveraged by new and existing workflow applications and systems.

This tutorial will present the ExaWorks SDK, and its constituent components: Flux, Parsl, RADICAL-Cybertools (RCT), and Swift/T. These components are widely used, and available tools for developing workflow applications. This tutorial will outline modern workflow motifs on HPC platforms (e.g., ensemble campaigns, ML-in-the-loop), illustrate science examples of these motifs, and discuss solutions using the ExaWorks SDK. One third of the tutorial is dedicated to presentations from experts, and two thirds are dedicated to hands-on exercises. Attendees will gain practical knowledge to develop best workflow practices to manage large-scale campaigns on the largest supercomputers. At the end of the tutorial, they will be able to apply these tools and techniques to their advanced workflows with minimal programming effort.

08:30-12:00 Session 3D: 1/2 Day Tutorial in Boylston
08:30
Tutorial: Testing your HPC system with buildtest

ABSTRACT. HPC support teams are responsible for supporting a highly complex HPC system in which system and software stacks are periodically updated throughout the system’s lifetime. After updates, some degree of testing is necessary to regain confidence in the system. But often support teams lack a consistent and thorough testing suite that can assure the integrity of the system. buildtest [1] is a testing framework that aims to solve this problem, enabling support teams to painlessly create and run routine acceptance and regression tests to ensure that an HPC system is working optimally. Buildtest functions as a testing framework in which a set of tests can be created and customized to satisfy the unique requirements of each HPC system and center. The basic building block for writing tests in buildtest framework is a YAML file known as a buildspec. The framework processes these buildspecs to automatically build a shell script and run the test on an HPC system, either locally or via batch submission. Buildtest provides a standard YAML structure for writing tests (buildspecs) which are validated by json schema which is parsed by buildtest in order to build a test. buildtest is a testing framework, not a repository of tests that can be run on any HPC system since no two systems are the same and there is no guarantee the same tests will be applicable for each site. This session will cover the basic concepts of buildtest with a mix of lecture and hands-on sessions using the buildtest interface and writing tests.

08:30-17:00 Session 3E: Full Day Workshop in Cambridge
08:30
Microsoft Workshop: Regulatory Compliance in Azure Considering Data Lifecycle Management
PRESENTER: Ana Del Campo

ABSTRACT. "In this session we will discuss regulatory compliance in Azure and specifically how we enable researchers to use regulated data in a secure and yet approachable and user friendly way. We will end with a panel discussion on data lifecycle management. These topics will be covered in session: 1. The compliance landscape. Where are we now and where are we going. 2. Secure Research Environment Architecture Pattern. - the key ingredients - Azure Policy - Microsoft Defender for Cloud and the Compliance Dashboard - Azure Data Factory - AVD as the Access Plane - Azure DevOps Template 3. Other Architectural options - Azure Machine Learning - Azure Synapse - Azure Purview - Azure Data Share 4. A panel discussion about trends in data growth, data life cycle management, and challenges around funding of long term retention in the era of large data sets and cloud. This will be a mix of research professionals, Azure Specialists and Architects as well as industry professionals."

08:30-12:00 Session 3F: 1/2 Day Workshop in Copley/Kenmore
08:30
Fifth Workshop on Strategies for Enhancing HPC Education and Training (SEHET22)
PRESENTER: Nitin Sukhija

ABSTRACT. High performance computing is becoming central for empowering scientific progress in the most fundamental research in various science and engineering, as well as society domains. It is remarkable to observe that the recent rapid advancement in the mainstream computing technology has facilitated the ability to solve complex large-scale scientific applications that perform advanced simulations of the implementation of various numerical models corresponding to numerous complex phenomena pertaining to diverse scientific fields. The inherent wide distribution, heterogeneity, and dynamism of the today’s and future computing and software environments provide both challenges and opportunities for cyberinfrastructure facilitators, trainers and educators to develop, deliver, support, and prepare a diverse community of students and professionals for careers that utilize high performance computing to advance discovery.

The SEHET22 workshop is an ACM SIGHPC Education Chapter coordinated effort aimed at fostering collaborations among the practitioners from traditional and emerging fields to explore strategies to enhance computational, data-enabled and HPC educational needs. Attendees will discuss approaches for developing and deploying HPC training, as well as identifying new challenges and opportunities for keeping pace with the rapid pace of technological advances - from collaborative and online learning tools to new HPC platforms. The workshop will provide opportunities for: learning about methods for conducting effective HPC education and training; promoting collaborations among HPC educators, trainers and users; and for disseminating resources, materials, lessons learned and good/best practices.

08:30-17:00 Session 3G: Full Day Tutorial in Georgian
08:30
Building and Selling a Strategic Plan for your Research Computing and Data Program
PRESENTER: Patrick Schmitz

ABSTRACT. This workshop will bring together Research Computing and Data professionals to explore a formal framework for strategic planning, how to identify the stakeholders crucial to realizing a strategic plan, and successful approaches to winning support among these stakeholders. The workshop will foster the establishment of peer mentoring relationships, and an active practice of leveraging these relationships to share leading practices around strategic planning. The workshop is open to RCD professionals who are familiar with issues around supporting Research Computing and Data, have experience contributing to strategic planning, and have some exposure to the RCD Capabilities Model.

08:30-17:00 Session 3H: Two 1/2 Day Workshops in Park
08:30
DDN/Intel Habana Workshop:Enhance your on-premises AI capabilities with Habana Gaudi and DDN AI400X2

ABSTRACT. Achieve more with efficient and cost-effective AI solutions from Habana Labs and DDN. Deep learning workloads are introducing new complexities for enterprise IT. Habana with its Gaudi AI Training Processors and DDN with its AI400X2 AI-optimized storage appliance have partnered to simplify deep learning training with a scalable solution that can grow with your needs - from POC to AI supercomputer. In this workshop, each partner will provide in-depth technical review of its respective AI solution and discuss how their combined AI processor/storage solution can be used to amplify AI training performance and capacity. Join us in this workshop to learn how you can train more and pay less.

08:30-12:00 Session 3I: 1/2 Day Workshop in Scollay
08:30
HPCSYSPROS Workshop
PRESENTER: David Clifton

ABSTRACT. In order to meet the demands of researchers requiring high-performance computing (HPC) resources, large-scale computational and storage machines must be built and maintained. The HPC systems professionals who tend these systems include system engineers, system administrators, network administrators, storage administrators, and operations staff who face problems that are unique to HPC systems. While many separate conferences exist for the HPC field and for the system administration field, none exist that focus specifically on the needs of HPC systems professionals. Support resources can be difficult to find to help with the issues encountered in this specialized field. Often, systems staff turn to the community as a support resource and opportunities to strengthen and grow those relationships are highly beneficial.

This Workshop is designed to share solutions to common problems, provide a platform to discuss upcoming technologies, and to present the state-of-the-practice techniques so that HPC centers will get a better return on their investment, increase performance and reliability of systems, and increase the productivity of researchers. Additionally, this Workshop is affiliated with the systems professionals’ chapter of the ACM SIGHPC (SIGHPC SYSPROS Virtual ACM Chapter). This session would serve as an opportunity for chapter members to meet face-to-face, discuss the chapter’s yearly workshop held at SC, and continue building our community’s shared knowledge base.

08:30-17:00 Session 3J: Full Day Tutorial in Studio 1
08:30
Scalable Automation of Data Management Tasks

ABSTRACT. Globus is widely used among the PEARC community for reliable data transfer, but a growing number of computationally intensive research activities require commensurate large-scale data management. A common use case is that of high-resolution imaging instruments, e.g., cryoEM and synchrotron beamlines, that require automation of data flows to increase throughput and researcher productivity, as well as to ensure the instrument remains highly utilized. Globus platform services (including Globus Flows and Globus Auth), combined with data distribution platforms that use the Modern Research Data Portal design pattern1, can greatly simplify the development and execution of automated data management tasks in this context.

We will describe how Globus platform services facilitate the construction of automated flows using our work with multiple instrument facilities as exemplars. Attendees will have the opportunity to build their own flows to move data, run analysis tasks, and share outputs with collaborators. We will also illustrate how these flows can feed into downstream data portals, science gateways, and data commons, enabling search and discovery of data by the broader community.

08:30-17:00 Session 3K: Full Day Tutorial in Studio 2
08:30
A Deep Dive into Constructing Containers for Scientific Computing and Gateways
PRESENTER: Eric Coulter

ABSTRACT. Containers have been rapidly gaining traction as a solution to run scientific computing software - portably and reproducibly - on more computing systems. The technology holds the promise of moving quickly and easily between a wide variety of computational resources, from laptops and workstations to HPC systems and cloud computing resources. However, significant barriers still exist to actually doing this in practice, particularly for well-established scientific and HPC codebases which expect to run on a particular operating system version or resource. Additional barriers exist for researchers unfamiliar with container technologies and development practices. The main goal of this full-day tutorial is to demonstrate and work through building and running non-trivial containers for scientific computing software with attendees. We provide examples of containerized software, discuss best practices for containerization workflows, and run examples in HPC and cloud environments, with and without the use of a Science Gateway to highlight the benefits and potential pitfalls of the putative portability of containers. This tutorial will highlight a variety of resources, including the Jetstream Cloud, The Airavata Gateway framework, and the Hive gateway/cluster, which will remain available to attendees via XSEDE after the conference. The subject matter will be approachable for intermediate to advanced learners, and is expected to be of interest to a diverse audience including researchers, support staff, and teams building science gateways.

08:30-12:00 Session 3L: 1/2 Day Tutorial in The Loft
08:30
Interactive Computing on the Anvil Composable Platform
PRESENTER: Erik Gough

ABSTRACT. XSEDE capacity systems have traditionally provided batch access to large scale computing systems, meeting the high-performance computing (HPC) needs of domain scientists across numerous disciplines. New usage patterns have emerged in research computing that depend on the availability of custom services such as notebooks, databases, elastic software stacks, and science gateways alongside traditional batch HPC. Anvil, an XSEDE capacity system deployed at Purdue University, integrates a high capacity, high performance computing cluster with a comprehensive ecosystem of software, access interfaces, programming environments, and composable services to form a seamless environment able to support a broad range of science and engineering applications. In this introductory-level tutorial, participants will get hands-on experience with the Anvil Composable Platform, a service that lowers the barrier to entry for deploying scalable elastic software stacks via web-based access to a Kubernetes-based private cloud.

08:30-17:00 Session 3M: Full Day Tutorial in White Hill
08:30
Open OnDemand, Open XDMoD, and ColdFront: an HPC center management toolset
PRESENTER: Dori Sajdak

ABSTRACT. The University at Buffalo Center for Computational Research (UB CCR) and Ohio Supercomputer Center (OSC) team up to offer HPC systems personnel a step-by-step tutorial for installing, configuring and using what many centers now consider vital software products for managing and enabling access to their resources. UB CCR offers two open source products - an allocations management system, ColdFront, and an HPC metrics & data analytics tool, Open XDMoD. OSC provides the open source OnDemand portal for easy, seamless web-based access for users to HPC resources. These three products have been designed to work together to provide a full package of HPC center management and access tools. In this tutorial the system administrators and software developers from OSC and UB CCR will walk attendees through the installation and configuration of each of these software packages. We’ll show how to use these three products in conjunction with each other and the Slurm job scheduler.

We will begin the tutorial with a short overview of each software product and how they tie together to provide seamless management of an HPC center. We’ll spend the first part of the tutorial demoing the installation and configuration of ColdFront and Open XDMoD. The second half will be spent on the installation of Open OnDemand and examples of configuring interactive apps. We’ll end with instructions on how to tie together Open XDMoD with Open OnDemand for access to job metrics within OnDemand and OnDemand usage data in Open XDMoD.

08:30-12:00 Session 3N: 1/2 Day Tutorial in Whittier
08:30
Building Portable, Scalable and Reproducible Scientific Workloads across Cloud and HPC
PRESENTER: Anagha Jamthe

ABSTRACT. This tutorial will focus on providing attendees exposure to cutting-edge technologies for building reproducible, portable and scalable scientific computing workloads, which can be easily run across Cloud and HPC machines. This tutorial will explain how to effectively leverage state-of-the-art open source technologies such as Jupyter, Docker and Singularity within the NSF-funded Tapis v3 platform, an Application Program Interface (API) for distributed computation. A brief introduction to each of these technologies will be covered in the tutorial, making it easy for audiences with little or no prior experiences to understand the concepts. We will include several hands-on exercises, which will enable the attendees to build a complete scientific workflow that can be seamlessly moved to different execution environments, including a small virtual machine and a national-scale supercomputer. Using techniques covered in the tutorial, attendees will be able to easily share their results and analyses with one or more additional users. This tutorial will make use of a specific machine learning image classifier analysis to illustrate the concepts, but the techniques introduced can be applied to a broad class of analyses in virtually any domain of science or engineering.

08:30-17:00 Session 3P: Full Day Workshop in Tremont
08:30
Programming and Profiling Modern Multicore Processors
PRESENTER: John Cazes

ABSTRACT. Modern processors, such as Intel's Xeon Scalable line, AMD's EPYC architecture, ARM's ThunderX2 design, and IBM’s Power9 architecture are scaling out rather than up and increasing in complexity. Because the base frequencies for the large core count chips hover somewhere between 2-3 GHz, researchers can no longer rely on frequency scaling to increase the performance of their applications. Instead, developers must learn to take advantage of the increasing core count per processor and learn how to extract more performance per core.

To achieve good performance on modern processors, developers must write code amenable to vectorization, be aware of memory access patterns to optimize cache usage, and understand how to balance multi-process programming (MPI) with multi-threaded programming (OpenMP). This tutorial will cover serial and thread-parallel optimization including introductory and intermediate concepts of vectorization and multi-threaded programming principles. We will address CPU as well as GPU profiling techniques and tools and give a brief overview of modern HPC architectures.

The tutorial will include hands-on exercises in parallel optimization, and profiling tools will be demonstrated on TACC systems. This tutorial is designed for intermediate programmers, familiar with OpenMP and MPI, who wish to learn how to program for performance on modern architectures.

12:00-13:30 Session 4: Co-located event in Ballroom A
12:00
Campus Champions Networking Event

ABSTRACT. The Champions evening networking event is an opportunity for members of the community to come together and network with other Champions and our colleagues in other RCD communities.. This activity is beneficial to building professional relationships and social networks outside of individual geographical areas. Time at the beginning of this activity will be utilized for administrative announcements.

Please join us for this (mostly) annual event at PEARC22

13:30-17:00 Session 5A: 1/2 Day Tutorial in Clarendon
13:30
Developing Science Gateways with Apache Airavata
PRESENTER: Suresh Marru

ABSTRACT. The authors will provide a hands-on tour of the entire Apache Airavata framework for deploying science gateways. Hands-on demonstrations will illustrate the major components of the framework: the core task and workflow execution management system (including metadata capture); security services including authentication and account management, authorization management, permission management, and resource credential management; Python Django-based Web development and content management for end user environments; and managed file transfer and distributed data management services. We will show Apache Airavata in action through in-operation science gateways created with collaborators in multiple scientific fields that are supporting research and education.

13:30-17:00 Session 5B: 1/2 Day Workshop in Copley/Kenmore
13:30
An Introduction to Advanced Features in MPI
PRESENTER: Victor Eijkhout

ABSTRACT. The MPI library is now in version 4, but most programmers use mechanisms from MPI-1 or 2 at best. This half-day tutorial, aimed at current MPI programmers, will discuss a number of MPI-3 and MPI-4 features that offer more flexibility, a more elegant expression of algorithms, and higher performance. There will be lab sessions to exercise the material

13:30-17:00 Session 5C: 1/2 Day Workshop in Scollay
13:30
Campus Champions: Connecting Community and Resources

ABSTRACT. The Campus Champions (CC) community’s vision is to foster “the creation of a dynamic and connected community of advanced research computing professionals that promote leading practices at the frontiers of research, scholarship, teaching, and industry application.”

This workshop will cover: * What role does a Campus Champion play at their institution? * What new and existing research computing resources are available to CC members? * How do CCs use XSEDE allocations? * How do CCs facilitate research computing at their institution? * Questions, discussions, and social time to connect and reconnect with the community.

13:30-17:00 Session 5D: 1/2 Day Tutorial in The Loft
13:30
Pegasus 5.0 Workflows with Containers
PRESENTER: Mats Rynge

ABSTRACT. Workflows are a key technology for enabling complex scientific computations. They capture the interdependencies between processing steps in data analysis and simulation pipelines as well as the mechanisms to execute those steps reliably and efficiently. Workflows can capture complex processes to promote sharing and reuse, and also provide provenance information necessary for the verification of scientific results and scientific reproducibility.

Pegasus (https://pegasus.isi.edu) is being used in a number of scientific domains doing production grade science. In 2016 the LIGO gravitational wave experiment used Pegasus to analyze instrumental data and confirm the first detection of a gravitational wave. The Southern California Earthquake Center (SCEC) based at USC, uses a Pegasus managed workflow infrastructure called Cybershake to generate hazard maps for the Southern California region. In 2019, SCEC completed the largest CyberShake study to date, producing the first physics-based PSHA maps for the Northern California region. Using Pegasus, they ran CyberShake workflows on three systems: HPC at the University of Southern California (USC), Blue Waters at the National Center for Supercomputing Applications (NCSA), and Titan at the Oak Ridge Leadership Computing Facility (OLCF), consuming about 120 million core hours of compute time. Pegasus orchestrated the execution of over 18,000 remote jobs using Globus GRAM, rvGAHP, and Condor Glideins, and transferred over 150 TB between the three systems. Pegasus is also being used in astronomy, bioinformatics, civil engineering, climate modeling, earthquake science, molecular dynamics and other complex analyses.

In 2020, we released Pegasus 5.0 that is a major improvement over previous releases. Pegasus 5.0 provides a brand new Python3 workflow API developed from the ground up so that, in addition to generating the abstract workflow and all the catalogs, it now allows you to plan, submit, monitor, analyze and generate statistics of your workflow.

17:00-18:30 Session 6: Co-located event in Ballroom B
17:00
Women in HPC @PEARC22

ABSTRACT. The Mission of Women in HPC is "To promote, build and leverage a diverse and inclusive HPC workforce by enabling and energising those in the HPC community to increase the participation of women and highlight their contribution to the success of supercomputing. To ensure that women are treated fairly and have equal opportunities to succeed in their chosen HPC career. To ensure everyone understands the benefits of promoting and achieving inclusivity."

We at WHPC would like to hold an event for all those interested and aligned with the Mission of WHPC. We are eager to reconnect with the WHPC community of members, allies, friends, and supporters. Thank you for allowing us this opportunity.

17:30-19:30 Session 7: Co-located event in Ballroom A
17:30
CASC Networking Event at PEARC22

ABSTRACT. Join the Coalition for Academic Scientific Computation (CASC) for a networking and social reception celebrating 30 years of supporting and engaging the research computing and data community. The event is open to all PEARC22 attendees. With nearly 100 member organizations, CASC represents many of the nation’s most forward-thinking universities and computing centers. You are invited to meet the current executive committee, chat with CASC members, swap notes on your favorite PEARC22 presentations and papers, and share stories from the supercomputing, academic scientific computation, cyberinfrastructure, and research computing and data communities over the past three decades.