Program for Tuesday, March 24th

PROGRAM FOR TUESDAY, MARCH 24TH

Days:

previous day

next day

all days

View: session overview talk overview

07:30-08:30 Breakfast [Copper Conference Center]

08:00-10:05 Session 8A: Preconditioning. Bighorn B

Chair:

Ben Southworth

08:00	Christopher Newman, Marianne Francois and Supriyo Ghosh An implicit approach to phase field modeling of solidification for metals and alloys ABSTRACT. We develop a fully-coupled, fully-implicit approach to phase field modeling of solidification for metals and alloys. Predictive simulation of solidification in pure metals and alloys remains a significant challenge in the field of materials science, as micro-structure formation during the solidification of a material plays an important role in the properties and performance of the solid material. Our approach consists of a finite element spatial discretization of the fully-coupled nonlinear system of partial differential equations at the microscale, which is treated implicitly in time with a preconditioned Jacobian-free Newton-Krylov (JFNK) method. The approach allows timesteps larger than those restricted by the traditional explicit CFL limit and is algorithmically scalable and efficient due to an effective preconditioning strategy based on algebraic multigrid and block factorization. Preconditioner performance is analyzed in terms of parallel scalability and efficiency on emerging heterogeneous architectures.
08:25	Maxim Olshanskiy and Alexander Zhiliakov Recurrent use of AL preconditioner for fluid equations on manifolds. ABSTRACT. Fluid equations posed on manifolds arise in continuum based models of thin material layers with lateral viscosity such as lipid monolayers and plasma membranes. Beyond biological sciences, fluid equations on surfaces appear in the literature on modeling of foams, emulsions and liquid crystals. Motivated by these applications, we consider Stokes and Navier-Stokes-type equations posed on a closed smooth compact surface embedded in $R^3$. The talk briefly reviews recent developments in finite element methods and numerical analysis of such systems. We further proceed to the discussion of efficient algebraic solvers with a focus on Krylov subspace methods with an Augmented Lagrangian preconditioner.
08:50	Matteo Frigo, Nicola Castelletto and Massimiliano Ferronato Enhanced Relaxed Physical Factorization Preconditioner for Three-Field Poroelasticity ABSTRACT. In this talk, we focus on the Relaxed Physical Factorization (RPF) preconditioner for 3 x 3 block systems arising in a three-field mixed finite element formulation of poroelasticity [1]. The preconditioner is obtained by: (i) combining proper physics-based splittings of the block matrix, and (ii) introducing an optimal relaxation parameter alpha, to circumvent the need of computing accurate approximate Schur complements. A possible drawback of the original RPF preconditioner might arise inverting inner blocks in the form C^-1 = C + (1/alpha) F F^T, where C is a regular square matrix and F F^T is largely rank-deficient, for small values of alpha. A novel strategy is introduced for the efficient and stable approximate application of C^-1 to a vector, based on the use of a projection operator onto the range of F. This approach gives rise to a novel class of preconditioners. Effectiveness and robustness of the proposed enhanced RPF algorithms are demonstrated in both theoretical benchmarks and real-world large-size applications, outperforming the native version of the preconditioner. Acknowledgment: Portions of this work were performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07-NA27344. References: [1] M. Frigo, N. Castelletto, M. Ferronato. A Relaxed Physical Factorization Preconditioner for Mixed Finite Element Coupled Poromechanics. SIAM J. Sci. Comput., 41(4), B694B720.
09:15	Ken Trotti, Mariarosa Mazza, Marco Donatelli and Rolf Krause Spectral-based fast solvers for finite volume discretizations of a conservative fractional diffusion problem ABSTRACT. Fractional Diffusion Equations (FDEs) are a kind of differential model involving fractional derivatives that enables the description of anomalous diffusion phenomena appearing in several applicative fields. In this paper, we focus on a two-dimensional conservative steady-state FDE. As is typical for problems in conservative form, we adopt a Finite Volume (FV) discretization approach. Precisely, we use both classical FVs and the so-called Finite Volume Elements (FVEs). The latter method differs from the standard one by the fact that it restricts the admissible solutions to a finite element space. While FVEs have already been applied in the FDE context, classical FVs have not yet taken into consideration. This paper is aimed at uncovering how the two schemes spectrally compare in presence of uniform meshes, and which of the two allows a faster and easier solution through spectral-based iterative solvers. In detail, by exploiting the Toeplitz-like structure and the related symbol, we perform a qualitative study of the spectrum and the conditioning of the considered matrices, and we use the obtained information for the design of ad hoc banded preconditioners and multigrid methods.

08:00-10:05 Session 8B: Quantum Linear Algebra Algorithms I. Bighorn C/1

Chair:

Lin Lin

08:00	Chao Yang Quantum Computing From a Linear Algebra Perspective ABSTRACT. This talk is a tutorial on the basics of quantum computing from a linear algebra point of view. I will first describe the linear algebra building blocks of quantum computing in terms of tensors, unitary transformations and their decompositions. I will then discuss the main techniques used in quantum algorithms to reduce the complexity of classical algorithms. I will explain a few well known quantum algorithms such as quantum teleportation, quantum Fourier transform, phase estimation and the Grover's algorithm.
08:25	Jarrod McClean Variational quantum algorithms running on quantum supreme chips ABSTRACT. The promise of quantum computing to accelerate the solution of generic problems ranging from the computation of eigenvalues to combinatorial optimization and machine learning has been documented theoretically for some time. Recently, quantum computers empirically demonstrated the ability to perform selected computations that exceed the reach of today's supercomputers, but they have not yet shown such a demonstration for one of the key applications of interest. In this talk, I broadly introduce the landscape of quantum computing and contextualize the recent quantum supremacy result by the Google team. I will then discuss some of the leading algorithms for near-term quantum computers, which are iterative variational algorithms, that attempt to leverage this capability to address problems in finding the lowest eigenvalues and eigenvectors of matrices with exponentially large dimensions and are similarly applicable to problems in mathematical optimization. Finally, I will put this into perspective with the current state of quantum hardware, and finish with an outlook for the timeline to practical applications.
08:50	Aram Harrow and John Napp Low-depth gradient measurements can improve convergence in variational hybrid quantum-classical algorithms ABSTRACT. A broad class of hybrid quantum-classical algorithms known as "variational algorithms" have been proposed in the context of quantum simulation, machine learning, and combinatorial optimization as a means of potentially achieving a quantum speedup on a near-term quantum device for a problem of practical interest. Such algorithms use the quantum device only to prepare parameterized quantum states and make simple measurements. A classical controller uses the measurement results to perform an optimization of a classical function induced by a quantum observable which defines the problem. While most prior works have considered optimization strategies based on estimating the objective function and doing a derivative-free or finite-difference-based optimization, some recent proposals involve directly measuring observables corresponding to the gradient of the objective function. The measurement procedure needed requires coherence time barely longer than that needed to prepare a trial state. We prove that strategies based on such gradient measurements can admit substantially faster rates of convergence to the optimum in some contexts. We first introduce a natural black-box setting for variational algorithms which we prove our results with respect to. We define a simple class of problems for which a variational algorithm based on low-depth gradient measurements and stochastic gradient descent converges to the optimum substantially faster than any possible strategy based on estimating the objective function itself, and show that stochastic gradient descent is essentially optimal for this problem. Importing known results from the stochastic optimization literature, we also derive rigorous upper bounds on the cost of variational optimization in a convex region when using gradient measurements in conjunction with certain stochastic gradient descent or stochastic mirror descent algorithms.
09:15	Dong An and Lin Lin Quantum linear system solver based on time-optimal adiabatic quantum computing and quantum approximate optimization algorithm ABSTRACT. In the past few years, there has been significant progress on solving linear systems of equations by quantum algorithms, including HHL, linear combination of unitaries and quantum signal processing. We investigate how adiabatic quantum computing (AQC) and quantum approximate optimization algorithm (QAOA) can offer an alternative route for solving quantum linear system problem. We demonstrate that with an optimally tuned scheduling function, AQC can solve a quantum linear system problem with complexity $\widetilde{O}(d\kappa~\text{poly}\log(d\kappa/\epsilon))$, assuming the matrix is $d$-sparse with condition number $\kappa$, and $\epsilon$ is the target accuracy. This achieves significant speedup over classical algorithms with respect to the dimension $N$ and at least quadratic speedup over the standard vanilla AQC. It also reaches the complexity lower bound with respect to both $\kappa$ and $\epsilon$ for quantum algorithms. The success of the time-optimal AQC implies that QAOA with optimal protocol can yield at least comparable runtime complexity compared to the time-optimal AQC and vanilla AQC.

08:00-10:05 Session 8C: Transport and Nuclear Applications. Bighorn C/2

Chair:

Steven Hamilton

08:00	Steven Hamilton and Thomas Evans Domain Decomposed Monte Carlo Transport on GPUs in Shift ABSTRACT. Efficient execution of Monte Carlo radiation transport on GPU architectures presents many challenges due to variability among particle histories, indirection in memory accesses, and the large amount of functional code within the parallelizable region. The Shift code developed at ORNL has demonstrated significant performance gains in GPU code relative to traditional CPU architectures through the use of an event-based approach which breaks larger computational kernels into smaller pieces to reduce thread divergence and improve device occupancy. Distributed memory parallelism has so far been limited to domain replication, which allows for extremely high parallel efficiency. However, for current target problems such as reactor depletion calculations, the amount of memory required to store the large tally fields exceeds the memory available on a single GPU. For these problems, it is necessary to distribute the memory across some number of devices rather than replicating the full problem on each device. In this talk, we describe the implementation of spatial domain decomposition within the Shift GPU solver. The focus will be on the interaction of particle communication with the event-based GPU algorithm. The inefficiency of dynamic memory allocation and the high levels of concurrency on GPUs necessitate different algorithmic choices relative to the existing CPU-based domain decomposition algorithm. As with the CPU algorithm, the GPU approach allows different spatial domains to be replicated by different factors to achieve better load balancing across the problem. Preliminary numerical results utilizing the Summit supercomputer will be presented, including a study on the impact of CUDA-aware MPI.
08:25	Fande Kong Scalable multilevel domain decomposition methods with SG coarse spaces for the multigroup neutron transport equations ABSTRACT. Numerical simulation of the multigroup neutron transport equations using modern supercomputers is important in nuclear reactor analysis. However, the development and design of efficient parallel algorithms is challenging. In this talk, we concern on scalable multilevel domain decomposition methods with SG coarse spaces for the neutron transport criticality calculations. Here ``SG coarse spaces" represents ``Single-component Grid based coarse spaces". SG coarse spaces are constructed using the single-component grid corresponding to a spatial matrix. Compared with the unmodified methods, the new domain decomposition methods equipped with SG coarse spaces are much faster. We numerically demonstrate that the proposed multilevel domain decomposition methods are scalable with more than 10,000 processor cores for realistic applications with billions of unknowns. Author comment: Please consider the abstract to be part of the “Transport and Nuclear Applications” session.
08:50	Hyeongkae Park and Marc Charest Multimaterial, three temperature radiation-hydrodynamics solver for ICF applications ABSTRACT. Accurate modeling of neutral particle transport behavior is of great importance in many science and engineering applications. One example is the simulation of radiation flow, which is the dominant energy transfer mechanism in many high-energy density physics (HEDP) applications, such as Inertial Confinement Fusion (ICF). Two-temperature radiation-hydrodynamics model is often used to model a HEDP problem. However, in ICF modelling, it is important to resolve an ion-electron energy equilibration time-scale, and it is desirable to model “three-temperature (3T)” radiation-hydrodynamics model to account for a non-equilibrium effect. Recently, we have extended our multiscale Lagrangian radiation-hydrodynamics scheme to incorporate 3T system. The novelty of our scheme is to seamlessly enable 3T physics only in the material of interest, such as the Deuterium-Tritium gas of an ICF capsule. In this presentation, we present our overall multiscale 3T radiation-hydrodynamics algorithm and algorithmic performance from a simple indirect ICF implosion/explosion simulations.
09:15	Andrew Till and Jim Warsa Iterative Methods for Thermal Radiation Transport with Multiple Preconditioners ABSTRACT. Discretizing and linearizing the stiff, seven-dimensional thermal radiation transport (TRT) equations with an implicit time discretization often requires an iterative solve of the resultant sparse, linear system. While GMRES preconditioned by source iteration and linear, physics-based, multifrequency-grey acceleration (LMFG) is often rapidly convergent, there are other preconditioner options that warrant exploration. We investigate four physics-based, diffusion-synthetic-acceleration (DSA) preconditioners, each of which accelerates some part of the scattering source. F-DSA targets the entire (full) scattering operator. WG-DSA targets the diagonal component (within-group) of the scattering operator. LMFG targets a specific rank-1 component of the scattering operator, with our non-standard implementation using a heuristic. TG (two-grid) targets another rank-1 component of the scattering operator, relying on a per-spatial-cell eigenvalue solve. We compare LMFG, LMFG plus WG-DSA, TG, and F-DSA. In this work, we use the time-dependent, heterogeneous Yang problem with real Compton scattering and absorption opacities to study the effect of using multiple physics-based preconditioners.
09:40	Alex Long Data motion analysis for Implicit Monte Carlo on CPUs and GPUs ABSTRACT. The Implicit Monte Carlo (IMC) method accurately solves thermal radiative transfer problems but quickly becomes computationally expensive when portions of the seven dimensional phase space are refined. As high performance computing systems move towards nodes with many accelerators, the need to adapt IMC for GPUs becomes critical. Two concerns in adapting the IMC method for GPU architectures are the memory size and memory access patterns. In the IMC method a particle moves through a discrete spatial mesh and interacts with the background material. This requires temperature dependent property lookups in every spatial cell. The size of the temperature dependent data is proportional to the number of discrete energy groups chosen in a simulation. If a particle’s full history is processed and then the next particle’s full history is processed (as in the history-based method) two successive particle histories may be in different energy groups and spatial cells from each other, thus requiring very different memory locations to be processed in quick succession. This results in poor spatial and temporal cache locality, which in turn makes single instruction multiple data (SIMD) processing difficult. For this reason, we seek alternative tracking methods and transport methods. One possible option is Woodcock tracking, where a group of spatial cells is represented with the largest interaction rate and each interaction is checked to see if should have actually occurred in a given material. If interaction rates are relatively small, this method significantly reduces memory size requirements and simplifies data access patterns. Woodcock tracking could show greater benefit on a GPU architecture compared to a CPU provided the interaction rate is not too large. Another possible downside of the Woodcock method is that continuous absorption, a variance reduction technique where a particle continuously deposits energy as it moves, is not possible given the artificially larger interaction rate used. In this work we measure the cost of the multigroup energy discretization in the context of history-based tracking, event-based tracking algorithms on GPUs and CPUs. For each method we estimate the amount of data required to compute particle histories. We also use a simple Woodcock tracking method to compare against the multigroup results. Because Woodcock tracking does not allow continuous absorption, we compare the variance between tracking methods. The runtime and variance inform the optimal algorithm choice for a specific architecture.

10:05-10:25 Coffee & Tea Service

10:25-12:30 Session 9A: Eigenvalue and SVD Computations. Bighorn B

Chair:

Andreas Stathopoulos

10:25	Jennifer Picucci and James Baglama Hybrid Iterative Refined Methods for Symmetric Eigenvalue Problems ABSTRACT. Large sparse symmetric eigenvalue problems are some of the most important and profoundly studied areas in numerical linear algebra. These problems have been historically studied and possess a wealth of applications with numerical features that make them attractive for developing new innovative routines. Although these problems are numerically attractive, they do have computational challenges for even the best modern routines. We have developed a hybrid restarted Lanczos method that combines restarting with Ritz vectors with restarting with iterative refined Ritz vectors to compute the largest eigenvalues and associated eigenvectors of a large sparse symmetric matrix A. The method is designed to use at little storage space as possible and only requires the evaluation of matrix-vector products with A. Some theoretical results for iterative refined Ritz will be provided. The iterative refined Ritz vectors is a new strategy based on refined Ritz approximations that has proven via numerical examples to perform better than the standard computation process for refined Ritz approximations. Numerous computed examples illustrate the performance of the method.
10:50	Steven Goldenberg and Andreas Stathopoulos Low-Rank Stopping Criteria for Block Parallel SVD ABSTRACT. The singular value decomposition (SVD) is one of the most commonly used low rank approximation techniques due to its optimality for all Schatten p-norms. However, most SVD algorithms only provide a residual stopping criteria at best, and at worst, stop based purely on iterations. Additionally, they require practitioners to determine the ideal rank, often without prior spectral information. We propose new stopping criteria for block parallel SVD algorithms and show their performance in both synthetic and real-world applications.
11:15	Jeremy Myers and Andreas Stathopoulos HiSVD: A Hybrid Incremental SVD Method for Streaming Large, Sparse Matrices ABSTRACT. Modern datasets continue to grow in size as technologies to collect and store big data become more widespread. The need has also grown for low-cost representations that keep only the most important features, often computed through the singular value decomposition (SVD) with iterative solvers for large matrices. When data is streamed, the full matrix is never available and the true SVD cannot be computed. Incremental SVD methods store only a window of the data each time and update a low-rank approximation on the fly. Typically small windows are used so that the SVD can be carried out efficiently. However, larger windows achieve better accuracy of the low-rank approximation. The Hybrid Incremental Singular Value Decomposition (HiSVD) is an engineering solution to approximate the SVD of a large, sparse matrix in the streaming setting. HiSVD updates a small, dense low-rank approximation by appending large amounts of sparse data and computing the SVD iteratively. HiSVD is only a small factor more expensive than the iterative solver it uses. HiSVD is effective in producing accurate approximations of sparse matrices where there is a large spectral distance between the most and least important features. This work demonstrates the successes and limitations of HiSVD in different streaming settings with respect to cost, accuracy, and sparsity in the data. Last, this work proposes using out-of-sample testing to change stopping criterion adaptively and reduce wasteful computations.
11:40	Ru Huang, Yuanzhe Xi and Michele Benzi Computing Generalized Matrix Functions with Singular Value Estimation ABSTRACT. We present an efficient algorithm based on polynomial interpolation to compute the value of a generalized matrix function defined by the matrix's singular value decomposition. Instead of interpolating the function on the entire interval of interest, we focus only on the interpolation accuracy at locations of the singular values. The idea of constructing the polynomial is to first obtain the singular value distribution of a matrix through Golob-Kahan bidiagonalization, and then use a three-term recurrence with a weighted inner product, which is related to the estimated distribution of singular values, to construct an orthonormal basis of odd polynomials and corresponding coefficients.
12:05	Vasileios Kalantzis Domain decomposition Rayleigh-Ritz approaches for symmetric generalized eigenvalue problems ABSTRACT. This paper proposes a parallel domain decomposition Rayleigh-Ritz projection scheme to compute a selected number of eigenvalues (and, optionally, associated eigenvectors) of large and sparse symmetric pencils. The projection subspace associated with interface variables is built by computing a few of the eigenvectors and associated first and second derivative of a zeroth-order approximation of the non-linear matrix-valued interface operator. On the other hand, the projection subspace associated with interior variables is built independently in each subdomain by exploiting local eigenmodes and matrix resolvent approximations. The sought eigenpairs are then approximated by a Rayleigh-Ritz projection onto the subspace formed by the direct sum of these two subspaces. Several theoretical and practical details are discussed, and upper bounds of the approximation errors are provided. Our numerical experiments demonstrate the efficiency of the proposed technique on sequential/distributed memory architectures as well as its competitiveness against schemes such as shift-and-invert Lanczos, and Automated MultiLevel Substructuring combined with $p$-way vertex-based partitionings.

10:25-12:30 Session 9B: Quantum Linear Algebra Algorithms II. Bighorn C/1

Chair:

Chao Yang

10:25	Lin Lin Quantum primitives and quantum linear algebra ABSTRACT. Many known quantum algorithms are actually based on a relatively small set of quantum "primitives", such as quantum adiabatic theorem, quantum Fourier transform and quantum walk. Recently, linear combination of unitaries (LCU) and quantum signal processing (QSP) are added to this list, particular in the context of quantum linear algebra, such as solving large scale linear systems. In this talk I will try to scratch the surface of these topics, and introduce recent progresses on optimization based, stable algorithms for quantum signal processing.
10:50	Lin Lin and Yu Tong Eigenstate filtering with application to quantum linear systems ABSTRACT. We present an algorithm based on quantum signal processing (QSP) and minimax polynomial to prepare a target eigenstate of a given Hamiltonian. Compared to quantum phase estimation (QPE) our method achieves better complexity scaling in terms of the initial overlap and desired precision, and uses fewer ancilla qubits. We apply this algorithm to the quantum linear system problem (QLSP), and present two algorithms based on quantum adiabatic computing (AQC) and Quantum Zeno effect respectively. Both algorithms achieve near-optimal complexity scaling in terms of the condition number and desired precision, without using phase estimation or amplitude amplification.
11:15	Rolando Somma and Yigit Subasi Quantum state verification in linear algebra problems ABSTRACT. I will study the problem of quantum state verification in the context of linear algebra problems, with emphasis in solving systems of linear equations. I will show that any quantum operation that verifies whether a given quantum state is close to the solution of the problem requires a minimum number of state preparations that depends on quantities such as the condition number of a matrix. I will then discuss the adverse implications of these results to variational and approximate optimization approaches to linear algebra.
11:40	Hari Krovi Quantum algorithm for linear systems and applications to plasma dynamics ABSTRACT. We will survey the main algorithm (original HHL) and the various improvements to the quantum algorithm to solve linear systems of equations. Then we will discuss its applications and present some new results in simulating Plasma dynamics.

10:25-12:30 Session 9C: Preconditioning. Bighorn C/2

Chair:

Tom Manteuffel

10:25	Andrew Barker Matrix-free preconditioning for high-order finite elements ABSTRACT. We consider matrix-free preconditioning strategies for high-order finite element operators. Emerging hardware architectures feature a large degree of intrinsic parallelism and concurrency, which makes them well-suited to high-order finite element methods. Explicit matrix assembly for such methods is prohibitively expensive, so that matrix-free alternatives are essential to realize good performance on emerging architectures. Tensorized matrix-free algorithms designed for the combination of high-order finite elements and modern hardware already exist for operator-vector multiplies, but solvers and preconditioners are not as well developed in the matrix-free setting. We explore some preconditioners for unassembled linear systems which require as little discretization information as possible, so they can be thought of as analogous to algebraic multigrid in the assembled case. We explore future directions for such solvers and present some examples of their practical use in real simulations.
10:50	Nicola Castelletto, Massimiliano Ferronato, Matteo Frigo and Joshua White Preconditioning for a Stabilized Mixed-Hybrid Formulation of Biot’s Poroelasticity Equations ABSTRACT. We consider a hybridized variant of a well-established low-order mixed three-field (displacement-velocity-pressure) formulation of Biot’s poroelasticity equations based on fully-implicit time discretization. The primary unknowns in our finite element model are nodal displacements, cell-centered pressures and face-centered pressures, with the latter serving as Lagrange multipliers to enforce the continuity of Darcy’s flux across inter-element boundaries (local conservation). A suitable stabilization is needed to ensure inf-sup stability of the method. Based on a macroelement approach, this is achieved introducing a local pressure jump. The stabilized discretization yields non-symmetric algebraic systems that must be solved at every timestep of the simulation. Robust and scalable solvers are needed to enable large-scale simulations on high performance computing platforms. A class of block preconditioners for accelerating the iterative convergence by a Krylov subspace method is developed based on sparse Schur-complement approximations. The robustness and efficiency of the proposed approach are proved in both theoretical benchmarks and real-world applications. Several test cases are addressed to test the algorithm scalability on massively parallel architectures. Portions of this work were performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07-NA27344.
11:15	Scott MacLachlan, Niall Madden and Thai Nhan Peclet-robust preconditioners for singularly perturbed convection-diffusion equations ABSTRACT. Embedded in many algorithms for the simulation of problems in computational fluid dynamics is the need to accurately and efficiently model boundary layers in the flow. In this talk, we consider the simpler model of a scalar convection-diffusion equation, where the structure of the arising boundary layers is well-known in existing theory. This structure leads to layer-adapted meshes on which standard finite-difference discretizations give approximation of the solution that is independent of the singular perturbation parameter. We propose preconditioners that also reflect the layer-adapted mesh structure, resulting in "parameter-robust" discretizations and solvers for convection-diffusion equations. In this talk, we will present theory for the one-dimensional case, along with heuristics and numerical results for the two-dimensional case.
11:40	Ben Southworth Nonsymmetric block preconditioners and heterogeneous DSA, compatible with voids ABSTRACT. Although significant research has been done on 2x2 block preconditioners, there remain simple open questions on (approximate) Schur-complement preconditioning. This talk begins by presenting new theoretical results on the convergence of fixed-point and minimal-residual iterations with 2x2 block-diagonal and block-triangular preconditioners. The solution of SN transport is then discussed in the context of 2x2 block preconditioning. After posing traditional source iteration and diffusion synthetic acceleration (DSA) in this framework, we apply similar techniques to develop a subdomain DSA block preconditioner for highly heterogeneous domains. Numerical results demonstrate the new method converges up to 5x faster than existing methods on difficult heterogeneous problems, is easily applied using O(1) AMG iterations (where AMG methods tend to struggle with existing approaches), and is trivially compatible with regions of void in the domain. On a difficult hohlraum chamber problem, related to the indirect-drive approach of inertial confinement fusion, the new method is shown to be a faster and more robust preconditioner than current state-of-the-art, and only requires preconditioning 3% of the global mesh elements. Diffusion-based preconditioning is broadly used in the numerical solution of Boltzmann transport problems, and we conclude by discussing generalizations of the new preconditioning for other other linear and nonlinear transport algorithms.
12:05	Sergey Klevtsov, Nicola Castelletto and Hamdi Tchelepi Multiscale preconditioning for coupled porous media flow and geomechanics on unstructured grids PRESENTER: Sergey Klevtsov ABSTRACT. Accurately predicting dynamics of subsurface systems often requires expensive large scale numerical simulation, characterized by high grid resolution, material heterogeneity and complex geometry necessitating the use of unstructured grids. These computational challenges are exacerbated when coupled multiphysics processes are considered such as, for example, coupled fluid flow and geomechanics. This motivates the development of robust and computationally efficient solution strategies. Multiscale methods have recently emerged as an attractive preconditioning option for accelerating Krylov solvers used to solve large linear systems resulting from discretization of governing equations. In this work we focus on quasi-static poroelasticity problem in an unstructured grid setting. Similar to multigrid, a multiscale method works by defining a coarse grid and grid transfer operators (prolongation and restriction) that are used to construct a series of coarse problems and interpolate coarse solutions back onto the fine scale. Classical multiscale methods, such as multiscale finite element (MsFE) and multiscale finite volume (MsFV) are challenging to apply on unstructured grids in arbitrary domains, due to the difficulty of prescribing approximate boundary conditions that are employed as localization assumptions. The multiscale restriction-smoothed basis (MsRSB) method for finite volume fluid flow with two-point flux approximation has been introduced to overcome these challenges. Recently an enhanced version has been proposed that is capable of dealing with non-M matrices resulting from multipoint flux approximation on non-orthotropic grids as well as finite element discretization of vector problems such as linear elasticity. Present work builds on enhanced MsRSB method and further extends it. A procedure based on a variant of hierarchical graph decomposition is adopted to obtain a topological description of the coarse grid from a partitioning of fine-scale grid cells into coarse agglomerates and identify basis function support regions and their boundaries. Basis functions are then computed via restricted smoothing operating on a suitably altered system matrix, ensuring robust convergence of the iterative smoothing process. In the context of coupled poromechanical modeling, this can be done for both pressure and displacement basis functions. We demonstrate applicability and algorithmic scalability of the MsRSB approach for fully unstructured three-dimensional coupled single-phase flow and linear elasticity problems using a set of representative synthetic and realistic benchmark problems.

16:30-18:35 Session 10A: Algebraic Preconditioners. Bighorn B

Chair:

Sherry Li

16:30	Jianlin Xia Preconditioners based on enhanced structured incomplete factorization (eSIF) for general SPD matrices ABSTRACT. In this talk, we show a type of effective and robust preconditioners for general symmetric positive definte (SPD) matrices based on structured incomplete factorization (SIF), called enhanced SIF (eSIF) preconditioners. The original SIF preconditioning strategy by Xia, et al. applies compression to scaled off-diagonal blocks to yield structured preconditioners. As compared with the original SIF preconditioners, the eSIF preconditioners have three significant advantages: 1. With a similar truncation tolerance in the approximation of certain subblocks, the preconditioner can approximate the original matrix to a much higher accuracy. 2. The preconditioner is much more effective in the sense that the reduction of condition numbers is much more significant, and the eigenvalues of the preconditioned matrix are much better clustered. 3. The multilevel eSIF precondiitoners are guaranteed to be positive definite, while the original multilevel SIF preconditioner has an impractical strict requirement in order to preserve positive definiteness.
16:55	Sherry Li, Nan Ding, Samuel Williams and Yang Liu Leveraging One-Sided Communication for Sparse Triangular Solvers ABSTRACT. We present an one-sided communication-based distributed-memory sparse triangular solve (SpTRSV). SpTRSV is used in conjunction with Sparse LU to affect preconditioning in linear solvers. One-sided communication paradigms enjoy higher effective network bandwidth and lower synchronization costs compared to their two-sided counterparts. We use a passive target mode in one-sided communication to implement a synchronization-free task queue to manage the messaging between producer-consumer pairs. Whereas some numerical methods lend themselves to simple performance analysis, the DAG-based computational graph of SpTRSV demands we construct a critical path performance model in order to assess our observed performance relative to machine capabilities. In alignment with our model, our foMPI-based one-sided implementation of SpTRSV reduces communication time by 1.5x to 2.5x and improves SpTRSV solver performance by up to 2.4x compared to the SuperLU_DIST's two-sided MPI implementation running on 64 to 4,096 processes on the Cray supercomputers.
17:20	Brian Kelley and Siva Rajamanickam Multicolor Block Gauss-Seidel using Kokkos ABSTRACT. The classical Gauss-Seidel (GS) method is a very common preconditioner, but it is inherently sequential. GS may be adapted into a distributed algorithm by coloring each processor's subdomain so that all processors sharing a color may safely update the global residual in parallel [Saad 99]. The method may also be parallelized on a shared memory system by coloring individual rows. This approach achieves high performance on GPUs and multicore CPUs but slower convergence than the sequential method [Deveci 16]. We present a shared-memory version of multicolor block GS using Kokkos. The coarsened graph determined by the blocks is colored, and GS is applied in parallel to blocks of the same color. The user can manage the tradeoff between parallelism and convergence rate by tuning the block size. In practice, the optimal block size yields a preconditioner that has a similar apply time as row-colored GS, but a similar convergence rate as sequential GS. On a preconditioned CG solve with matrix af_shell7 (504k rows), row-colored GS took 2068 iterations, block GS took 1682, and sequential GS took 1646. The new block GS implementation is fully parallelized with Kokkos, including the initial setup. To achieve acceptable performance on GPUs, we show a simple new algorithm for finding an approximately balanced graph clustering. This iterative "balloon clustering" algorithm grows each cluster from a root vertex. When a cluster is forced to steal vertices from adjacent clusters, it chooses the vertices by a heuristic that seeks to minimize edge cuts and maintain a rounded shape. These characteristics are important for achieving performance in multicolor block GS because they reduce the number of colors in the coarsened graph, and thus increase the available parallelism. Coloring of the coarsened graph is done with the same implementation from [Deveci 16], but modified to produce a balanced coloring.
17:45	Shaked Regev and Michael Saunders SSAI: A symmetric sparse approximate inverse preconditioner for the conjugate gradient methods PCG and PCGLS ABSTRACT. For positive definite $Ax = b$ with explicit sparse $A$, we propose a symmetric sparse preconditioner for the conjugate gradient method. On very large systems, PCG + SSAI has proved to be more robust than sparse Cholesky factorization or PCG + incomplete Cholesky (backslash and ichol in MATLAB). For least-squares problems, an analogous form of preconditioned CGLS has also proved to be robust.

16:30-18:35 Session 10B: Quantum Linear Algebra Algorithms III. Bighorn C/1

Chair:

Jarrod McClean

16:30	Roel Van Beeumen Unitary Matrix Decompositions for Quantum Circuit Synthesis ABSTRACT. On a quantum computer, a unitary transformation must be implemented by a quantum circuit. This circuit consists of a network of quantum gates, each performing a simple unitary transformation on one or a few qubits. The process of choosing and connecting these elementary quantum gates to achieve a unitary transformation of an n-qubit input is known as quantum circuit synthesis or quantum compiling. Mathematically, this amounts to decomposing a 2^n × 2^n unitary matrix into a product of simpler unitary matrices, where each of them is a sum of Kronecker products of simple 2 × 2 matrices and elementary unitary matrices. Because the decomposition of a unitary matrix in terms of elementary unitaries is not unique, multiple circuits can be generated for the same unitary transformation. However, to make the circuit efficient, additional requirements in terms of the total number of quantum gates, or circuit depth, and noise levels must be considered. In this talk, I will give an overview of the numerical linear algebra challenges for efficient quantum compiling.
16:55	Stephen Jordan, Pedro Costa and Aaron Ostrander Quantum algorithm for simulating the wave equation ABSTRACT. We present a quantum algorithm for simulating the wave equation under Dirichlet and Neumann boundary conditions. The algorithm uses Hamiltonian simulation and quantum linear system algorithms as subroutines. It relies on factorizations of discretized Laplacian operators to allow for improved scaling in truncation errors and improved scaling for state preparation relative to general purpose linear differential equation algorithms. We also consider using Hamiltonian simulation for Klein-Gordon equations and Maxwell's equations.
17:20	Andrew Childs, Jin-Peng Liu and Aaron Ostrander High-precision quantum algorithms for partial differential equations ABSTRACT. Quantum computers can produce a quantum encoding of the solution of a system of differential equations exponentially faster than a classical algorithm can produce an explicit description. However, while high-precision quantum algorithms for linear ordinary differential equations are well established, the best previous quantum algorithms for linear partial differential equations (PDEs) have complexity poly(1/epsilon), where epsilon is the error tolerance. By developing quantum algorithms for the Poisson equation based on adaptive-order finite difference methods, we improve the complexity of quantum algorithms for linear PDEs to be poly(d, log(1/epsilon)), where d is the spatial dimension. Our algorithms apply high-precision quantum linear system algorithms to systems whose condition numbers and approximation errors we bound. It's the first kind of high-precision quantum algorithms that achieves the best known dependence on the parameters d and epsilon for the Poisson equation with homogeneous boundary conditions.

16:30-18:35 Session 10C: UQ and Montecarlo methods. Bighorn C/2

Chair:

Howard Elman

16:30	Akwum Onwunta and Howard Elman An efficient solver for nonlinear Bayesian inverse problems ABSTRACT. Bayesian inverse problems are often solved with Markov chain Monte Carlo (MCMC)-type schemes. When the problems are governed by large-scale discrete nonlinear PDEs, they are computationally challenging because one would then need to solve the forward problem at every sample point. In this talk, we consider discrete empirical interpolation method (DEIM) for the forward solves within an MCMC routine. Crucially, a preconditioning strategy for the DEIM model is also proposed to accelerate the forward solves. The preconditioned DEIM model is applied to a finite difference discretization of a nonlinear PDE in the MCMC model. Numerical experiments show that this approach yields accurate forward results. Moreover, the computational cost of solving the associated inverse problem is significantly reduced.
16:55	Emil Loevbak, Giovanni Samaey and Stefan Vandewalle Multilevel Monte Carlo with improved correlation for kinetic equations in the diffusive scaling ABSTRACT. In many application domains, it is necessary to compute the distribution of an ensemble of particles, subject to transport and collision phenomena. Kinetic equations are PDEs that model these particles in a position-velocity phase space. In the low collision regime, explicit particle-based Monte Carlo methods are an efficient way to simulate these high dimensional equations, but as the collision rate increases, these methods suffer from severe time-step constraints due to kinetic equation stiffness. Asymptotic-preserving particle schemes are able to produce stable, low-cost simulations in the high collision regime. This stability comes at the cost of a bias in the computed results. This bias increases linearly with the size of the simulation time step. In previous work, we showed that the multilevel Monte Carlo method can be used to reduce this bias by combining simulations with large and small time steps. This multilevel approach, however, requires a large amount of computation to bridge the gap from large time steps to time steps in the order of magnitude of the collision rate. In this work, we present an improved correlation strategy, which greatly reduces the cost of bridging this gap.
17:20	Ye Li A Discrete-time Cancer Immunotherapy Model under the Kolmogorov Equation View and the Reaction-Diffusion Model ABSTRACT. The immune checkpoint blockade is one of the most popular cancer treatment method, especially the way that PD-1/PD-L1 combine with CTLA-4, for the reason of good antitumor activity, long-lasting response and controllable security. In this project, we construct an ODE model based on the dynamical of tumor cells, CD4+T, cytokines, and PDL-1 in the immunotherapy system. We discretize them by a Forward Euler Method, which can be viewed through a mean-field approximation from a discrete version. Then we obtain a Kolmogorov equation for observing the probabilistic discrete-time model under the Moran process. Finally, based on our cancer immunotherapy ODE model, we construct a cancer immunotherapy reaction-diffusion model to observe the diffusion of tumor cells, CD4+T, cytokines, and PDL-1 in our cancer immunotherapy system and forecasting pharmacodynamic endpoints by the calculation and stability determination of local equilibrium. The result of this project can provide suggestions on "how to control " the dynamics of different patterns in cancer immunotherapy.
17:45	Shahaf Finder, Eran Treister and Oren Freifeld Exploiting Sparsity in the Estimation of Gaussian Mixture Models ABSTRACT. The Gaussian distribution is one of the most fundamental statistical tools for modeling data in various applications. However, estimating full covariance matrices is both prohibitively expensive and over-parametrized at large scales. This problem worsens in the Gaussian Mixture Model (GMM) setting. We consider restricting the GMM to a Gaussian Markov Random Field Mixture Model (GMRF-MM), where the precision matrices of the components are sparse. While this approach is common in the single Gaussian case, it was hardly explored so far with mixtures. To learn a GMRF-MM with a given sparsity pattern, we propose an efficient optimization method for the Maximum Likelihood Estimator (MLE) of a precision matrix for each component, and use it inside an Expectation-Maximization framework. When the sparsity pattern is unknown, we show that a seemingly natural implementation of GMRF-MM using an l-1 prior (AKA Graphical LASSO), yields models that are far from optimal. To overcome this, we suggest a procedure that improves the result of Graphical LASSO by removing its bias using a second optimization. We show that our method outperforms the dense GMM and GMRF-MM using Graphical LASSO on real and synthetic high-dimensional datasets.
18:10	Nicolas Venkovic, Paul Mycek, Luc Giraud and Olivier Le Maître Deflation and preconditioning strategies for sequences of sampled stochastic elliptic equations ABSTRACT. We are interested in the quantification of uncertainties in discretized elliptic partial differential equations with random coefficients. In sampling-based approaches, this relies on solving large numbers of symmetric positive definite linear systems with different matrices. In this work, we investigate recycling Krylov subspace strategies for the iterative solution of sequences of such systems. The linear systems are solved using deflated conjugate gradient (CG) methods, where the Krylov subspace is augmented with approximate eigenvectors of the previously sampled operator. These operators are sampled by Markov chain Monte Carlo, which leads to sequences of correlated matrices. First, the following aspects of eigenvector approximation, and their effect on deflation, are investigated: (i) projection technique, and (ii) restarting strategy of the eigen-search space. Our numerical experiments show that these aspects only impact convergence behaviors of deflated CG at the early stages of the sampling sequence. Second, unlike sequences with multiple right-hand sides and a constant operator, our experiments with multiple matrices show the necessity to orthogonalize the iterated residual of the linear system with respect to the deflation subspace, throughout the sampling sequence. Finally, we observe a synergistic effect of deflation and block-Jacobi (bJ) preconditioning. While the action of bJ preconditioners leaves a trail of isolated eigenvalues in the spectrum of the preconditioned operator, for 1D problems, the corresponding eigenvectors are well approximated by the recycling strategy. Then, up to a certain number of blocks, deflated CG methods with bJ preconditioners achieve similar convergence behaviors to those observed with CG when using algebraic multigrid (AMG) as a preconditioner.