View: session overviewtalk overview

09:00 | Performance of Multigrid Reduction in Time (MGRIT) on GPU machines ABSTRACT. Multigrid reduction in time (MGRIT) is an optimal, scalable, multilevel parallel in time algorithm that has been shown to achieve significant speedup vs. sequential time stepping on massively parallel CPU machines. This speedup is attained through the use of additional computational resources and only occurs after a "cross-over point" where sufficient parallel resources are concurrently used. This talk examines how the move to GPU architectures impacts speedup, the crossover point, and the best choice of parallel decomposition of the space-time domain for MGRIT compared to CPU architectures. |

09:25 | Space -Time AMG for Advection - Diffusion PRESENTER: Ahsan Ali ABSTRACT. Hyperbolic PDEs, which arise from many important applications, remain a challenging problem for AMG, because the resulting linear systems are highly non-symmetric. AMG for such problems is typically hard because the traditional coarsening, interpolation and relaxation strategies designed for symmetric positive definite systems can break down. Additionally, effective AMG for hyperbolic problems would provide a parallel-in-time solver in this area of traditional difficulty. It has been observed that for space-time advection-diffusion problems, classical AMG, as well as nonlinear variants of AMG such as local approximate ideal restriction, can suffer from slow convergence and instability issues. In this work, we develop an adaptive-like method which can transition continuously from AIR for advective problems to a more classical AMG approach for diffusive. The goal is a robust solver which addresses various space-time advection-diffusion problems including purely advective, purely diffusive and the advection-diffusion mixed regime. |

09:50 | Fast multigrid reduction-in-time for linear advection PRESENTER: Oliver Krzysik ABSTRACT. Many iterative parallel-in-time algorithms have been shown highly efficient for diffusion-dominated partial differential equations (PDEs), but often are inefficient or even divergent when applied to advection-dominated PDEs. We consider the application of the iterative two-level Parareal algorithm, and its multilevel generalization of multigrid reduction-in-time (MGRIT) to linear advection PDEs. The key to efficient time integration with these methods is using a coarse-grid operator that provides a sufficiently accurate approximation to the the so-called ideal coarse-grid operator. For certain classes of semi-Lagrangian discretization, we present a novel semi-Lagrangian-like coarse-grid operator that leads to fast multilevel time integration of linear advection PDEs. The operator is composed of a semi-Lagrangian discretization followed by a correction term, with the correction designed so that the leading-order truncation error of the composite operator is approximately equal to that of the ideal coarse-grid operator. Parallel results show speed-ups over sequential time integration for variable-wave-speed advection problems using high-order discretizations. |

10:15 | Parameter-Robust Preconditioning for Oseen Iteration Applied to Navier-Stokes Control Problems PRESENTER: Santolo Leveque ABSTRACT. In this talk we will present novel, fast, and effective preconditioned iterative methods for distributed time-dependent Navier-Stokes control problems with Crank-Nicolson discretization in time. The key ingredients of the solver are a saddle-point type approximation for the linear systems, an inner iteration for the (1,1)-block accelerated by a preconditioner for convection-diffusion control, and an approximation to the Schur complement based on a potent commutator argument applied to an appropriate block matrix. A range of numerical experiments validate the effectiveness of our new approach. |

10:40 | Multigrid-augmented deep learning preconditioners for the Helmholtz equation PRESENTER: Eran Treister ABSTRACT. We present a data-driven approach to iteratively solve the discrete heterogeneous Helmholtz equation at high wavenumbers. We combine multigrid ingredients with convolutional neural networks (CNNs) to form a preconditioner which is applied within a Krylov solver. Two types of preconditioners are proposed 1) U-Net as a coarse grid solver, and 2) U-Net as a deflation operator with shifted Laplacian V-cycles. The resulting CNN preconditioner can generalize over residuals and a relatively general set of wave slowness models. On top of that, we offer an encoder-solver framework where an ``encoder'' network generalizes over the medium and sends context vectors to another ``solver'' network, which generalizes over the right-hand-sides. We show that this option is more efficient than the stand-alone variant. Lastly, we suggest a mini-retraining procedure, to improve the solver after the model is known. We demonstrate the efficiency and generalization abilities of our approach on a variety of 2D problems. |

09:00 | Meshless Remap of Native Fields for Earth System Models via Generalized Moving Least Squares PRESENTER: Paul Kuberry ABSTRACT. Coupling global Earth system models requires passing information between codes where the fields to be transferred exist on meshes that are independently discretized. The coupling takes place over the shared subdomain boundary, resulting in interface discretizations that are generally mismatched. Generalized moving least squares (GMLS) is a meshless reconstruction technique for generating an approximation of a field from nearby neighbor information. Reconstruction of fields defined on a sphere are performed by first approximating the tangent plane for each point where a field reconstruction is requested. The tangent plane approximation is then used as a local coordinate system on which sampling functionals are evaluated, acting on a basis. After solving a least squares problem, the field is then reconstructed in the local coordinate system, and transformed to the global coordinate system, if desired. The Compadre toolkit is used for the implementation of all described techniques. Numerical results will be presented demonstrating remap of several fields on grids where the degrees of freedom represent cell-averaged quantities, measuring both accuracy and conservation. |

09:25 | A Discontinuous-Galerkin-in-Time Framework for Multirate Time Integration of Interface-Coupled Problems PRESENTER: Jeffrey Connors ABSTRACT. A framework is presented to design multirate time stepping algorithms for two dissipative models with coupling across a physical interface. The coupling takes the form of boundary conditions imposed on the interface relating the solution variables for both models to each other. The multirate aspect arises when numerical time integration is performed with different time step sizes for the component models. We describe a unified approach to develop multirate algorithms for these problems. This effort is pursued though the use of discontinuous-Galerkin time stepping methods, acting as a general unified framework, with different time step sizes. The two models are coupled across user-defined intervals of time, called {\it coupling windows}, using polynomials that are continuous on the window. The coupling method is shown to reproduce the correct interfacial energy dissipation, discrete conservation of fluxes, and asymptotic accuracy. Also, the method could provide a limit for the total communication needed between components, independent of the number of time steps on a coupling window. In principle, methods of arbitrary order are possible. As a preliminary step, we focus on the presentation and analysis of monolithic methods for advection-diffusion models coupled via generalized Robin-type conditions. The monolithic methods could be computed using a Schur-complement approach, or by Schwarz-type iteration. We also discuss some recent progress for partitioned algorithms and different types of coupling conditions. |

09:50 | Air-Sea Light: Coupling a Simple Ocean-Atmosphere Model PRESENTER: Michael Gaiewski ABSTRACT. The Multilayer Thermal Rotating Shallow Water Equations are used to model weather and climate where layers of fluid of varying densities are “stacked” vertically. These equations track variables such as layer thicknesses, or “heights”, temperatures, and velocities over time. When considering an ocean-atmosphere system, there is a large jump in density at the interface between the atmosphere and the ocean. We use the so-called “bulk condition”, a Robin boundary condition, to couple the model together in an accurate and stable manner. We developed software called Air-Sea Light to implement the primitive equations and thermal shallow water equations for the ocean and atmosphere. In this talk we will present results from the model and demonstrate different coupling algorithms applied to the model. |

10:15 | A Lagrange Multiplier Partitioned Scheme for Coupling Reduced Order Models PRESENTER: Amy de Castro ABSTRACT. Partitioned methods for coupled problems permit independent solution for each constituent component, allowing for the reuse of existing single-component codes. Thus, partitioned methods can shorten code development and validation times for multiphysics and multiscale applications. This efficiency can be further increased by allowing at least one of the “codes” to be a projection based reduced order model (ROM). We simulate this scenario by considering a model interface problem that is discretized independently on two non-overlapping subdomains. On one of the subdomains, a ROM is constructed by performing proper orthogonal decomposition (POD) on a snapshot ensemble to obtain a low-dimensional reduced order basis, followed by a Galerkin projection onto this basis. This ROM is then coupled to either a traditional finite element model (FEM), or to a separate ROM constructed on the other subdomain, through a Lagrange multiplier representing the interface flux. The flux can be expressed as the solution of a Schur complement system stemming from the monolithic coupled problem. Application of an explicit time integration scheme at this juncture effectively decouples the subdomains, allowing for their independent solution at each time step. We present numerical results which show the method’s efficacy in achieving accurate ROM-FEM and ROM-ROM coupled solutions while decreasing the required computational time. |

10:40 | Interface Flux Recovery Framework for Constructing Partitioned Heterogeneous Time-Integration Methods PRESENTER: Chad Sockwell ABSTRACT. Coupling schemes serve as the glue which holds multi-physics simulations together and are ideally efficient, stable, and conservative in terms of flux. A common approach for partitioned schemes wishing to employ different time integrators to different subproblems is to lag the coupling terms in time. This can lead to accuracy issues, especially in multistage methods. In this talk, we present a novel framework for partitioned heterogeneous time-integration methods, which allows the coupling of arbitrary multistage methods without reducing their order of accuracy. The core idea of our approach is to effect the splitting through an auxiliary monolithic system, which is used to construct a polynomial-in-time approximation of the interface flux. The latter allows each subproblem to be solved independently at every time-step. Our framework differs from other approaches in that it allows a flexible choice of underlying time-integrators for the individual subproblems without compromising the time-accuracy at the coupled problem level. We demonstrate the method on a model problem representing a proxy of the tracer transport component in air-sea couplings in earth system models. We report numerical tests evaluating accuracy and flux conservation for different pairs of time-integrators in the explicit Runge-Kutta and Adams-Moulton families. |

09:00 | Polynomial Approximation to the Inverse of a Matrix PRESENTER: Ron Morgan ABSTRACT. We look at approximating the inverse of a matrix with a polynomial of the matrix. For a sparse matrix, this in some sense gives a sparse version of the inverse. We use a polynomial related to the GMRES polynomial and implement using the GMRES polynomial roots. We show that this polynomial can often give a remarkably accurate approximation to the inverse. For ill-conditioned matrices, a high degree polynomial is needed and a composite of two polynomials may be most efficient. Applications include large systems of linear equations with multiple right-hand sides. Challenges with stability and with indefinite matrices will be mentioned. |

09:25 | A comparative study of global coarsening and local smoothing multigrid methods on locally refined meshes for distributed matrix-free FEM computations PRESENTER: Peter Munch ABSTRACT. We have recently extended the geometric multigrid infrastructure of the open-source finite element library deal.II [1,2] with global coarsening multigrid features [3-5] in addition to the existing local smoothing multigrid functionalities [6]. These two geometric multigrid variants differ in the way how they define multigrid levels. While local smoothing uses all cells on a given refinement level for this definition, global coarsening obtains the levels by recursively globally coarsening the finer levels. In this presentation, we will compare in a fair and unbiased way the advantages and disadvantages of global coarsening and local smoothing algorithms from implementation and performance points of view for equivalent smoothers and on state-of-the-art CPU hardware. To our knowledge, this has not been done in the literature yet, since most projects only have implemented one of these approaches. The experiments are conducted in the setting of the current state of the art for higher-order finite element methods with matrix-free operator evaluation as the main computational kernel. We will present serial simulations as well as parallel weak and strong scaling on up to 147,456 CPU cores on the SuperMUC-NG supercomputer. Preliminary results indicate that global coarsening algorithms have a better parallel behavior for comparable smoothers due to their additional flexibility in load balancing. In the serial case, the high costs of applying hanging-node constraints in comparison to highly efficient matrix-free kernel evaluations might be a dominating factor, leading to advantages of local smoothing, which per construction ignores hanging-node constraints during smoothing, even though the number of solver iterations is typically slightly higher. In addition, we will show how a well-designed global coarsening implementation is extendable to (global) polynomial coarsening, also known as p-multigrid, for locally refined meshes and high-order FEM. The combination of such a p-multigrid solver, AMG and one of the geometric multigrid solvers described above allows to construct hybrid multigrid solvers [7] for locally refined meshes both for continuous and discontinuous Galerkin discretizations. Such a hybrid multigrid solver is used in the incompressible Navier-Stokes solver ExaDG [8] to solve the pressure Poisson problem on non-trivial coarse grids, e.g., of a lung geometry [9], with moderate number of refinement levels and moderate polynomial degrees. We will share our experiences made during the application of the new global-coarsening infrastructure and the hybrid multigrid solver built around it to real-world problems, particularly regarding the requirements posed to other parts of the code, which were not an issue previously when local smoothing was used. References [1] Arndt, D., Bangerth, W., Davydov, D., Heister, T., Heltai, L., Kronbichler, M., Maier, M., Pelteret, J.P., Turcksin, B. and Wells, D., 2021. The deal. II finite element library: Design, features, and insights. Computers & Mathematics with Applications, 81, pp.407-422. [2] Arndt, D., Bangerth, W., Blais, B., Fehling, M., Gassmöller, R., Heister, T., Heltai, L., Köcher, U., Kronbichler, M., Maier, M. and Munch, P., 2021. The deal. II library, version 9.3. Journal of Numerical Mathematics, 29(3), pp.171-186. [3] Sundar, H., Biros, G., Burstedde, C., Rudi, J., Ghattas, O. and Stadler, G., 2012, November. Parallel geometric-algebraic multigrid on unstructured forests of octrees. In SC'12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (pp. 1-11). IEEE. [4] Becker, R. and Braack, M., 2000. Multigrid techniques for finite elements on locally refined meshes. Numerical linear algebra with applications, 7(6), pp.363-379. [5] Becker, R., Braack, M. and Richter, T., 2007. Parallel multigrid on locally refined meshes. In Reactive Flows, Diffusion and Transport (pp. 77-92). Springer, Berlin, Heidelberg. [6] Clevenger, T.C., Heister, T., Kanschat, G. and Kronbichler, M., 2021. A flexible, parallel, adaptive geometric multigrid method for FEM. ACM Transactions on Mathematical Software (TOMS), 47(1), pp.1-27. [7] Arndt, D., Fehn, N., Kanschat, G., Kormann, K., Kronbichler, M., Munch, P., Wall, W.A. and Witte, J., 2020. ExaDG: High-order discontinuous Galerkin for the exa-scale. In Software for Exascale Computing-SPPEXA 2016-2019 (pp. 189-224). Springer, Cham. [8] Fehn, N., Munch, P., Wall, W.A. and Kronbichler, M., 2020. Hybrid multigrid methods for high-order discontinuous Galerkin discretizations. Journal of Computational Physics, 415, p.109538. [9] Kronbichler, M., Fehn, N., Munch, P., Bergbauer, M., Wichmann, K.R., Geitner, C., Allalen, M., Schulz, M. and Wall, W.A., 2021, November. A next-generation discontinuous galerkin fluid dynamics solver with application to high-resolution lung airflow simulations. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 1-15). |

09:50 | Fast and robust multilevel low-rank tensor solvers for elliptic PDEs ABSTRACT. Multilevel low-rank tensor approximation has been rigorously analyzed for certain classes of functions solving elliptic second-order PDEs in up to three dimensions. These include, in particular, analytic functions, solutions with algebraic corner singularities and highly-oscillatory solutions to multiscale diffusion problems. In these settings, multilevel low-rank tensor methods have been shown to admit exponentially convergent approximations and hence to be comparable in efficiency to such techniques as spectral and hp approximations and multiscale FE approximation. The approximation power of tensor methods is based on the successive adaptive approximation in suitable low-dimensional subspaces, typically realized by the matrix SVD. In the case of multilevel representation, that amounts to the adaptive computation of low-parametric effective discretizations within extravagantly large but generic finite-element spaces. The use of low-rank tensor representation for these background finite-element spaces allows, in principle, refining them until the meshwidth reaches the level of machine precision if necessary. Computational algorithms exploiting such approximations are data-driven in the sense that the effective discretizations are adapted to the data and are constructed in the course of computation, not being confined by design to any specific bases (such as polynomial or piecewise-polynomial bases). However, in three dimensions, the complexity of solvers based on low-rank tensor refinement becomes prohibitively large when one pursues stability and convergence in the energy norm (for example, by employing rigorous preconditioning techniques). In this contribution, we revisit a BPX-type preconditioning technique developed recently for multilevel low-rank tensor methods and present a novel algorithm for applying the preconditioner. Exploiting an hierarchy of nested finite-element spaces, it dramatically speeds up numerical tensor-structured solvers based on the preconditioner. Our numerical experiments demonstrate the efficiency of the new approach in two and three dimensions, including the setting of multiscale diffusion with solutions exhibiting low regularity and high-frequency oscillations. |

10:15 | A sine transform based preconditioner for all-at-once systems from time-dependent PDEs PRESENTER: Sean Hon ABSTRACT. In this work, we propose a simple yet generic preconditioned Krylov subspace method for a large class of nonsymmetric block Toeplitz all-at-once systems arising from discretizing evolutionary partial differential equations. Namely, we propose a novel symmetric positive definite preconditioner, which can be efficiently diagonalized by the discrete sine transform matrix. Our approach is to first permute the original linear system to obtain a symmetric one, and subsequently develop the desired preconditioner based on the spectral symbol of the modified matrix. Then, we show that the eigenvalues of the preconditioned matrix sequences are clustered around $\pm 1$, which entails rapid convergence when the minimal residual method is devised. Our proposed method can be generalized for high-order backward difference time discretization schemes, which applies on a wide range of time-dependent PDEs. Numerical examples are given to demonstrate the effectiveness of our proposed preconditioner, which consistently outperforms an existing block circulant preconditioner. |

10:40 | Accelerating Jacobi Iteration for Solving Nonsymmetric Linear Systems of Equations using Scheduled Relaxation PRESENTER: Mohammad Islam ABSTRACT. The Scheduled Relaxation Jacobi (SRJ) method is a linear solver algorithm which greatly improves the convergence of the Jacobi iteration through the use of judiciously chosen relaxation factors (an SRJ scheme) which attenuate the solution error. Schemes have primarily been developed to accelerate the solution of elliptic PDEs (e.g. Poisson's equation) in the literature. The goal of this paper is to develop a methodology for constructing SRJ schemes which are suitable for solving non-elliptic PDEs (or equivalently, the nonsymmetric linear systems arising from the discretization of such PDEs). These schemes are obtained by numerically solving a constrained minimization problem which guarantees the solution error will not grow as long as the Jacobi iteration matrix associated with the linear system has eigenvalues which lie in certain elliptical regions of the complex plane. We demonstrate that these schemes accelerate the convergence of standard Jacobi iteration when solving the nonsymmetric linear systems arising from discretization of the 2D steady advection-diffusion equation. |

11:25 | Looking beyond polynomial bases for wave propagation with finite elements PRESENTER: Adeleke O Bankole ABSTRACT. The eﬀiciency of matrix-free operator application improves with polynomial degree up to about degree 7 and is remarkably stable beyond. The use of very high degree in wave propagation is moderated by two related factors: the Runge effect leads to small time steps and polynomials have relatively poor resolving power near the centers of elements. This can be observed by noting that the Gauss-Lobatto-Legendre points cluster quadratically toward the ends of the unit interval, and thus the Courant-Friedrichs-Lewy (CFL) criteria scales with h/p^2 . We explore a nonpolynomial basis and quadrature based on analytic conformal mappings proposed by Hale and Trefethen [2] that offer more uniform resolution with carefully designed regions of analyticity mapped from the ellipsoids associated with Gauss quadrature. We extend the local Fourier analysis toolkit [3, 4] to investigate the dispersion properties and impact on the CFL criteria. We also provide numerical examples to explore the accuracy and performance relative to polynomial bases for linear and nonlinear wave propagation problems using libCEED [1]. References [1] Brown J., Abdelfattah A., Barra V., Beams N., Camier J.S., Dobrev V., Dudouit Y., Ghaffari L., Kolev T., Medina D., Pazner W., Ratnayaka T., Thompson J.L., and Tomov S., libCEED: Fast algebra for high-order element-based discretizations. Journal of Open Source Software, 6(63):2945, (2021). [2] Hale N. and Trefethen N.L., New quadrature formulas from conformal maps. Siam J. Numer. Anal., Vol 46, No. 2, pp. 930-948 (2008). [3] Thompson J.L., Local Fourier Analysis of Domain Decomposition and Multigrid Methods for High-Order Matrix-Free Finite Elements. Dissertation, (2021). [4] Thompson J.L., LFAToolkit. https://github.com/jeremylt/LFAToolkit.jl, (2021). |

11:50 | Meshfree Geometric Multilevel Methods PRESENTER: Andrew Jones ABSTRACT. Multilevel methods can be efficient and robust preconditioners for Krylov subspace methods for solving linear systems that arise from discretizing elliptic partial differential equations (PDEs). Geometric multilevel methods typically use grid-based restriction and interpolation operators to reduce the size of the linear system that has to be solved and accelerate the convergence. In this work, we remove grid dependence by developing new meshfree restriction/interpolation operators based on radial basis functions (RBF) methods and combine these with a meshfree coarsening strategy. We apply these methods as a solver and preconditioner in a V-cycle iteration to meshfree discretizations of elliptic PDEs on surfaces. The meshfree methods utilized in this work are RBF finite differences (RBF-FD) and generalized finite difference (GFD). Results on scaling and convergence rates of the new meshfree multilevel method with different parameter choices are displayed. We demonstrate the improved performance of Krylov subspace methods preconditioned with algebraic multigrid (AMG) solvers, using both PyAMG and MueLu. |

12:15 | Residual and restarting in Krylov subspace evaluation of the Phi function PRESENTER: Mike Botchev ABSTRACT. An efficient Krylov subspace algorithm for computing actions of the phi matrix function for large matrices is proposed. This matrix function is widely used in exponential time integration, Markov chains, and network analysis and many other applications. Our algorithm is based on a reliable residual based stopping criterion and a new efficient restarting procedure. We analyze residual convergence and prove, for matrices with numerical range in the stable complex half-plane, that the restarted method is guaranteed to converge for any Krylov subspace dimension. Numerical tests demonstrate efficiency of our approach for solving large scale evolution problems resulting from discretized in space time-dependent PDEs, in particular, diffusion and convection-diffusion problems. An implementation of the proposed method, code phiRT, is available at https://team.kiam.ru/botchev/expm/. |

12:40 | Practical Considerations for the Adoption of Anderson Acceleration in Nonlinear Diffusion Accelerated Transport PRESENTER: Qicang Shen ABSTRACT. The use of the Anderson Acceleration (AA) method within the context of CMFD-accelerated neutron transport k-eigenvalue calculation is explored for practical problems. The current application of CMFD only has a linear convergence rate. The AA method is similar to a quasi-Newton and can improve the convergence rate of the fixed-point iterations. We implement the AA for the 2D calculation in MPACT. The initial results show that the AA method can improve the convergence rate in a trivial problem, and reduce the number of transport outer iterations and the runtime in practical problems. However, the success of AA in practical problems, where simulations in symmetry or the domain decomposition due to parallel computing are used, relies on one essential factor--the treatment of the domain boundary incoming angular flux. And the efficiency of AA was affected by the computational cost for solving the optimization problem and the computational cost of the CMFD solver. |

13:05 | Novel mass-based multigrid relaxation schemes for the Stokes equations ABSTRACT. In this talk, we propose three novel block-structured multigrid relaxation schemes based on distributive relaxation, Braess-Sarazin relaxation, and Uzawa relaxation, for solving the Stokes equations discretized by the mark-and-cell scheme. In our earlier work, we discussed these three types of relaxation schemes, where the weighted Jacobi iteration is used for inventing the Laplacian involved in the Stokes equations. In \cite{he2018local}, we show that the optimal smoothing factor is $\frac{3}{5}$ for distributive weighted-Jacobi relaxation and inexact Braess-Sarazin relaxation, and is $\sqrt{\frac{3}{5}}$ for $\sigma$-Uzawa relaxation. Here, we propose mass-based approximation inside of these three relaxations, where mass matrix $Q$ obtained from bilinear finite element method is directly used to approximate to the inverse of scalar Laplacian operator instead of using Jacobi iteration. Using local Fourier analysis, we theoretically derive the optimal smoothing factors for the resulting three relaxation schemes. Specifically, mass-based distributive relaxation, mass-based Braess-Sarazin relaxation, and mass-based $\sigma$-Uzawa relaxation have optimal smoothing factor $\frac{1}{3}$, $\frac{1}{3}$ and $\sqrt{\frac{1}{3}}$, respectively. Note that the mass-based relaxation schemes do not cost more than the original ones using Jacobi iteration. Another superiority is that there is no need to compute the inverse of a matrix. These new relaxation schemes are appealing. |

11:25 | Solving linear systems of the form ($A + \gamma UU^T){\bf x} = {\bf b}$ ABSTRACT. We consider the solution of large linear systems of the form $$(A + \gamma UU^T){\bf x} = {\bf b}$$ with $A$ of size $n\times n$, $\gamma >0$ a parameter and $U$ of size $n\times k$, by preconditioned Krylov subspace methods. We make the following assumptions: \begin{itemize} \item $A$ is possibly singular, but $A+\gamma UU^T$ is nonsingular; %\item $A+A^T$ is positive semidefinite; \item forming $A+\gamma UU^T$ explicitly is undesirable, for example due to loss of structure or sparsity in $A$; \item the dimension $k$ is much smaller than $n$, but not small. \end{itemize} Linear systems of this form arise in several areas of scientific computing, including the solution of the Stokes and Navier-Stokes problems with augmented Lagrangian methods, the solution of reduced KKT systems in constrained optimization, sparse-dense least-squares problems, the discretization of certain integro-differential equations, the solution of PDEs with non-local boundary conditions, and elsewhere. Solving such linear systems can be challenging, especially for large values of the parameter $\gamma$. We will present and investigate different variants of a preconditioning technique based on a suitable splitting of the coefficient matrix. The performance of these preconditioners will be illustrated by means of numerical experiments on a variety of test problems. This is joint work with Chiara Faccio (Scuola Normale Superiore, Pisa). |

11:50 | Preconditioned Iterative Methods for Structure-Preserving Discretisations PRESENTER: James Jackaman ABSTRACT. In recent years, there has been substantial work on the development of so-called "structure-preserving" discretisations of differential equations, where the numerical approximation reflects key conservation laws observed of the continuum solution. While such discretisations are unquestionably valuable in situations where physical relevance and fidelity are closely tied to the conservation laws, their practical utility is limited by the fact that standard iterative methods for solution of the resulting linear and nonlinear systems only resolve the underlying conserved quantities when solved to near-machine precision. In this talk, we present a generalisation of the (preconditioned) flexible GMRES algorithm that can preserve arbitrarily many such conserved quantities exactly at (nearly) any stopping tolerance, with a small additional cost. Numerical results are presented for several structure-preserving finite-element discretisations of linear parabolic and hyperbolic model problems. |

12:15 | A MULTIGRID METHOD FOR THE BIOT’S MODEL ON LOGICALLY RECTANGULAR GRIDS PRESENTER: Javier Zaratiegui ABSTRACT. Poroelasticity models are present in many societal relevant applications such as geothermal energy extraction, CO2 storage or hydraulic fracturing, among others. A general three-dimensional mathematical formulation for such models was established in [1], coupling the fluid flow and the mechanical deformation within a porous media. Numerical simulation is mandatory when modeling real applications, leading to an intense search for suitable discretizations methods, as well as robust and efficient olvers for the algebraic systems that arise from such problems. In this work, we consider a combined multipoint stress–multipoint flux mixed finite element method for the discretization of the quasi-static Biot’s model, recently introduced in [2]. This method is locally conservative, can be formulated on simplicial and quadrilateral meshes, and can handle discontinuous full tensor permeabilities and Lam´e coefficients, which are typically associated with subsurface flows. The strategy employed to solve the large systems of algebraic equations that arise from the discretization of Biot’s model becomes a crucial aspect in numerical simulations, since it requires high computational cost. Two important approaches are monolithic and iterative coupling methods. In this talk, we propose a monolithic technique based on geometric multigrid for the discretization scheme described in [2] when logically rectangular grids are considered. Such grids take advantage of recent computer architectures, so that the overall performance is improved by using structured data. Finally, numerical results are presented in order to illustrate the robustness of this new solver with respect to the physical and discretization parameters of the model. REFERENCES [1] M.A. Biot, General theory of three-dimensional consolidation. J. Appl. Phys., Vol. 12, pp. 155–164, 1941. [2] I. Ambartsumyan, E. Khattatov and I. Yotov, A coupled multipoint stress–multipoint flux mixed finite element method for the Biot system of poroelasticity. Comput. Methods Appl. Mech. Engrg., Vol. 372, 113407, 2020. |

12:40 | A stable mimetic finite-difference discretization for convection-dominated diffusion equations PRESENTER: Casey Cavanaugh ABSTRACT. Convection-diffusion equations arise in a variety of applications such as particle transport, electromagnetics, and magnetohydrodynamics. Simulation of the convection-dominated case, even with high-fidelity techniques, is particularly challenging due to sharp boundary layers and shocks causing jumps and discontinuities in the solution, and numerical issues such as loss of the maximum principle in the discretization. These complications cause instabilities, admitting large oscillations in the numerical solutions when using traditional methods. Drawing connections to the simplex-averaged finite element method (S. Wu and J. Xu, 2020), we develop a mimetic finite-difference (MFD) discretization using exponentially averaged coefficients to guarantee monotonicity of the scheme and stability of the solution as the diffusion coefficient approaches zero. The finite-element framework allows for transparent analysis of the MFD, such as proving well-posedness and deriving error estimates from the finite-element setting. Numerical tests are presented confirming the stability of the method and verifying the error estimates. |

13:05 | Efficient multigrid methods for high-order nonlinear solid mechanics on emerging architectures PRESENTER: Rezgar Shakeri ABSTRACT. High-order finite element methods are widely perceived to be too expensive for most engineering analysis due to the cost of residual assembly per degree of freedom (dof). Moreover, the Jacobian of a non-linear operator rapidly loses the sparsity and the number of nonzero entries per dof grows with $p^d$ as the order $p$ of basis function increases in $d$ dimension, which necessitate matrix-free implementation for more efficient finite element simulation. A matrix-free approach yields better performance, both in terms of storage and solve time over matrix-based approaches when higher order discretizations are utilized. Due to ill-conditioning of the elliptic problem, preconditioning is required to ensure the fast convergence of iterative solvers. One of the large class of preconditioners is based on multigrid methods. Geometric multigrid methods are common preconditioner for elliptic partial differential equations. $H$-multigrid, which is the typical type of geometric multigrid used for low-order discretizations, coarsens the mesh by aggregating multiple elements. However, with unstructured meshes aggregating fine grid elements to form the coarse grid representation can be somewhat complex. $P$-multigrid, which coarsens the mesh by reducing the order of the basis functions on each element, can be a more natural preconditioner for unstructured meshes with high-order elements. Another multigrid preconditioner is Algebraic multigrid (AMG) where the restriction operators are constructed algebraically without requiring any mesh information. These methods use only information of the matrix sparsity and its entries of the assembled operators which discards the performance benefits of using a matrix-free representation for high-order implementation. Therefore, we use geometric multigrid with $p$ coarsening and utilizing AMG as the coarse grid solver along with a Chebyshev polynomial smoother in each multigrid cycle. In contrast to $h$-multigrid, each level in $p$-multigrid is related to a different approximation polynomial order $p$, instead of the element size $h$. We investigate a matrix-free high-order finite element discretization with Newton-Krylov iterative solvers and $p$-multigrid preconditioning for the hyperelasticity problem at finite strain on unstructured 3D meshes in parallel. Domain decomposition, parallel assembly, and communication over all processors are performed by employing the parallel computing library PETSc and the discretized matrix-free operator is evaluated in libCEED. libCEED is a library that offers a purely algebraic approach with a run-time selection of implementations tuned for a variety of computational device types, including CPUs and GPUs. libCEED provides a low-level Application Programming Interface (API) for users, where they describe finite element operators and the library coordinates the action of the selected backend to the original operator independently on each device/MPI task. In this talk, we investigate the performance of high-order finite element implementation on different CPU and GPU architectures, along with $p$-multigrid preconditioning for our 3D hyperelastic models using unstructured mesh and polynomials of order $p=1$ through $p=4$. |

11:25 | Waveform iterations for thermal coupling PRESENTER: Philipp Birken ABSTRACT. We discuss iterative methods for the partitioned time-integration of multiphysics problems, which commonly exhibit a multiscale behavior, requiring independent time-grids. Specifically, we consider unsteady thermal fluid structure interaction (FSI), modelled by heterogeneous coupled linear heat equations The ideal method for solving these problems allows independent and adaptive time-grids, higher order time-discretizations, is fast and robust, and allows the coupling of existing subsolvers, executed in parallel. Waveform relaxation (WR) methods can potentially have all of these properties. These iterate on continuous-in-time interface functions, obtained via suitable interpolation. The difficulty is to find suitable convergence acceleration, which is required for the iteration to converge quickly. We present a fast and highly robust, second order in time, adaptive WR method. Basis is a Dirichlet-Neumann coupling at the interface. The method uses an analytically optimal relaxation parameter derived for the fully-discrete scheme in 1D. This parameter depends on discretization and problem, and in practice leads to very fast linear convergence. Numerical results demonstrate the robustness of the approach. We further present a novel, parallel WR method, using asynchronous communication techniques during time integration to accelerate convergence. Instead of exchanging interpolated time-dependent functions at the end of each time-window or iteration, we exchange time-point data immediately after each timestep, by making use of one sided communication. The analytical description and convergence results of this method generalize existing WR theory. |

11:50 | A highly parallel, probabilistically-based solver for nonlinear PDEs in the exascale PRESENTER: Jorge Morón ABSTRACT. PDDSparse is an iterative probabilistic domain decomposition (PDD) algorithm designed to solve large-scale elliptic boundary-value problems (BVPs). As in any PDD approach [1], the domain is divided into subdomains and the solution is computed on the fictitious interfaces using the probabilistic representation of BVPs provided by the Feynman-Kac formula, extended here to nonlinear operators via Fréchet calculus [2]. Then, the nonlinear PDE is recast into a sequence of linearised ones, and the quantities leading to the solution at each point of the interface on each linearised iterate can be computed nearly embarrassingly in parallel by Monte Carlo, thus suiting the trend of modern supercomputer architectures towards GPU-accelerated cores. Another point of PDDSparse is that diffusions departing from an interface are confined to a few subdomains, dramatically shortening their mean exit times; and allowing PDDSparse to efficiently distribute the PDE data among the nodes. Moreover, the nonlinear loop lends itself naturally for using iterative control variates which reduce the variance and hence speed up the simulation, in a way reminiscent of multigrid. Along with this novel (and very non standard) formulation, we will provide in this talk proof-of-concept results and empirical scalability tests of a C++/CUDA implementation performed at the CINECA supercomputer facility. [1] F. Bernal, G. dos Reis, and G. Smith. Hybrid PDE solver for data-driven problems and modern branching, European J. Appl. Math. 28(6), 949-972 (2017). [2] F. Bernal. Trust-Region Methods for Nonlinear Elliptic Equations with Radial Basis Functions, Computers & Mathematics with Applications, 72(7),1743-1763 (2016) |

12:15 | A robust preconditioner for Lippmann Schwinger equation in two dimension PRESENTER: V A Kandappan ABSTRACT. The scattering problems (in electromagnetics, acoustics, geophysics etc.) follow the time-harmonic Helmholtz equation, which on reformulating in volume integral form results in the Lippmann Schwinger equation. The discretization of the Lippmann Schwinger equation results in a dense linear system. It has been established that certain subblocks of the resulting dense linear system are low-rank. Using Hierarchical trees to represent these low-rank blocks reduces the memory and computational effort involved in solving such dense systems. So it is natural to consider these hierarchical low-rank representations to accelerate the mat-vecs involved in Iterative Solvers. One such Hierarchical representation is HODLR(Hierarchically Off-Diagonal Low Rank) Matrices. HODLR matrices consider all the off- diagonal blocks as low-rank. It works optimally for one-dimension problems, whereas for two dimensions problems, the rank of the off-diagonal blocks in- creases as a function of the size of the off-diagonal block. In this talk, we address this issue and provide a new hierarchical representation that suits the problems in 2D. In addition to that, we discuss a preconditioner based on that 2D specific hierarchical representation. The preconditioner developed is based on two things, one is the new 2D hierarchical representation, and the other one is domain decomposition. We will discuss the features of this pre-conditioner when applied to the iterative solver for the Lippmann Schwinger equation and provide its efficiency. |

12:40 | Model order reduction for coupled systems PRESENTER: Niccolò Discacciati ABSTRACT. Mutual interactions in multi-component systems often lead to emergent phenomena, i.e., system behaviors that cannot be expected by studying the individual components separately. Practical applications, including multi-query simulations, demand accurate and computationally fast methods able to detect such behaviors. This can be achieved through Model Order Reduction (MOR) techniques. However, in many cases, simulations of the fully coupled model are unfeasible. The coupling functions may not be a-priori available, physical or numerical incompatibilities among the components may be present, or the computational cost of the coupled problem is too large. In this work, we propose a MOR technique that constructs local independent surrogate models, whose interaction approximates the behavior of a given coupled problem at a reduced computational cost. In the offline phase we construct the local models by suitably parametrizing the interface data, while in the online phase we combine them through domain decomposition techniques. We show the potential of the proposed technique in terms of accuracy, computational performance, and robustness in a series of test cases, including nonlinear and multiphysics problems. |

13:05 | A Multidimensional Proxy Point Method and Its Applications ABSTRACT. Fast direct hierarchical algorithms or iterative preconditioning schemes for large, dense matrices require efficient low-rank approximation methods to obtain their computational cost savings. There are several ways of obtaining such approximations depending on the type of matrix involved. For kernel matrices, analytic approximation methods have been used in the Fast Multipole Method and other hierarchical matrix algorithms. In this talk, we focus on the proxy point method, in which pairwise interactions between two separated clusters are approximated using the interactions of each cluster with a smaller chosen set of "proxy" points that separate the clusters. One disadvantage of the proxy point method is uncertainty about which and how many proxy points are necessary to obtain a given approximation accuracy: existing analysis is highly context-dependent, especially in the case of kernel functions of more than one real variable. Here, we describe a new, general proxy point method for functions of more than one variable and its accuracy analysis. We then give its applications in a variety of contexts. |