View: session overviewtalk overview

09:00 | Smoothed Aggregation for Difficult Stretched Mesh and Coefficient Variation Problems PRESENTER: Ray Tuminaro ABSTRACT. Four adaptations of the smoothed aggregation algebraic multigrid (SA-AMG) method are proposed with an eye towards improving the convergence and robustness of the solver in situations when the discretization matrix contains many weak connections. These weak connections can cause higher than expected levels of fill-in within the coarse discretization matrices and can also give rise to sub-optimal smoothing within the prolongator smoothing phase. These smoothing drawbacks are due to the relatively small size of some diagonal entries within the filtered matrix that one obtains after dropping the weak connections. The new algorithms consider modifications to the Jacobi-like step that defines the prolongator smoother, modifications to the filtered matrix, and also direct modifications to the resulting grid transfer operators. Numerical results are given illustrating the potential benefits of the proposed adaptations. |

09:25 | Energy-minimization for Algebraic Multigrid: Robustness and Performance PRESENTER: Andrea Franceschini ABSTRACT. Algebraic Multigrid (AMG) is a popular iterative method, with widespread use due to its effectiveness in solving linear systems arising from PDE discretizations. Among different advantages, the key feature of AMG is the optimality, i.e., the ability to guarantee a convergence rate independent of the mesh size. This is obtained through a good interplay between the smoother and the interpolation. Unfortunately, for difficult problems, such as those arising from structural mechanics or diffusion problems with large jumps in the coefficients, standard smoothers and interpolation techniques are not enough to ensure fast AMG convergence. In these cases, it is necessary to rely on more advanced interpolation strategies to enhance AMG robustness. In this talk, we present an improved prolongation operator inspired by the energy minimization concept [1], and show how this minimization can be cast as a constrained minimization problem. The constraint is twofold: the prolongation operator must be sparse, and its range must represent the operator near-kernel. A central goal of this work is to improve the interpolation quality for certain fine-grid nodes where standard techniques struggle. Unfortunately, it is quite common in vector-valued PDEs that some fine-grid nodes require large prolongation entries thus jeopardizing the conditioning of the subsequent levels. Another goal is that of measuring the quality of the C/F splitting using near-kernel information, before computing prolongation weights, with the goal to eventually improve the splitting, as is done for instance with compatible relaxation. To solve this constrained minimization problem, we propose and analyze a restricted Krylov subspace iterative procedure and discuss preconditioning approaches aimed at speeding up the setup stage. Finally, thanks to some numerical experiments, we demonstrate how the convergence rate can be significantly increased at a reasonable setup cost. [1] Olson, L. N., Schroder, J. B. & Tuminaro, R. S. A General Interpolation Strategy for Algebraic Multigrid Using Energy Minimization. SIAM Journal on Scientific Computing 33, pp. 966-991, 2011. |

09:50 | Adaptive algebraic multigrid based on algebraic a posteriori error estimation PRESENTER: Kaiyi Wu ABSTRACT. The construction of efficient algebraic multigrid methods to solve large-scale linear systems such as systems with weighted graph Laplacians demands flexible setup and solve phases. In particular, the setup phase of the AMG algorithm calls for a reasonable estimate of the error and adjusting the coarse space hierarchy to improve the convergence rate. Such a task, however, may be computationally expensive. This talk presents a nearly-linear scheme to compute one such error estimator using a Helmholtz decomposition on sparse graphs. By solving a linear system on the spanning tree and a least-squares problem on the cycle space, a localized error estimator is formed to approximate errors on each edge of the graph. This error estimator can be applied to various $\alpha$AMG algorithms to build coarse space hierarchies at a modest cost. We also explore other error estimators based on operator preconditioning techniques. Numerical experiments are presented to demonstrate the efficacy of the path cover adaptive algebraic multigrid (PC-$\alpha$AMG) using a posteriori error estimate for discretized second-order elliptic partial differential equations and large-scale real-world graph Laplacians problems. |

10:15 | A Matrix-Free Approach for Algebraic Multigrid ABSTRACT. We present a matrix-free approach approach for algebraic multigrid for systems of partial differential equations with collocated degrees of freedom. This approach constructs the prolongation and restriction operators by considering an auxiliary operator defined by the distance Laplacian matrix and then performing smoothed aggregation on the auxiliary operator. We provide results to verify the validity of the method, including parallel results using GPUs. |

10:40 | Generalizing Reduction-Based Algebraic Multigrid PRESENTER: Tareq Uz Zaman ABSTRACT. Algebraic Multigrid (AMG) methods are often robust and effective solvers for solving the large and sparse linear systems that arise from discretized PDEs and other problems, relying on heuristic graph algorithms to achieve their performance. Reduction-based AMG (AMGr) algorithms attempt to formalize these heuristics, by providing two-level convergence bounds that depend concretely on properties of the partitioning of the given matrix into its fine- and coarse-grid degrees of freedom. MacLachlan and Saad (SISC 2007) proved that the AMGr method yields provably robust two-level convergence for symmetric and positive-definite matrices that are diagonally dominant, with convergence factor bounded as a function of a coarsening parameter. However, when applying AMGr algorithms to matrices that are not diagonally dominant, not only do the convergence factor bounds not hold, but measured performance is notably degraded. In this talk, we present modifications to the classical AMGr algorithm that improve its performance on matrices that are not diagonally dominant. In particular, two aspects of AMGr interpolation are considered. First, in the original work on AMGr, assumptions of M-matrix or diagonally dominant structure removed the need to consider strength of connections, which become important when considering problems, such as anisotropic diffusion equations, that lead to non-diagonally dominant systems. Here, we find that lumping positive off-diagonal connections is important to achieving robustness. Secondly, to generalize the diagonal approximations of $A_{\!{f}\!{f}}$ used in classical AMGr, we use a sparse approximate inverse (SPAI) method, with nonzero pattern determined by strong connections, to define the AMGr-style interpolation operator, coupled with rescaling based on relaxed vectors. We present numerical results demonstrating the robustness of this approach for non-diagonally dominant systems. |

09:00 | A Low-Rank Solver for the Stochastic Unsteady Navier-Stokes Equations PRESENTER: Howard Elman ABSTRACT. We study a low-rank iterative solver for the unsteady Navier-Stokes equations for incompressible flows with a stochastic viscosity. The equations are discretized using the stochastic Galerkin method, and we consider an all-at-once formulation where the algebraic systems at all the time steps are collected and solved simultaneously. The problem is linearized with Picard's method. To efficiently solve the linear systems at each step, we use low-rank tensor representations within the Krylov subspace method, which leads to significant reductions in storage requirements and computational costs. Combined with effective mean-based preconditioners and the idea of inexact solve, we show that only a small number of linear iterations are needed at each Picard step, and we demonstrate the impact of rank-compression on solution accuracy. The proposed algorithm is tested with a model of flow in a two-dimensional symmetric step domain with different settings to demonstrate computational efficiency. |

09:25 | Monte Carlo Estimators for the Diagonals of a Matrix PRESENTER: Arvind Saibaba ABSTRACT. Estimating the diagonals of symmetric matrices, that are accessible only through matrix vector products, is an important problem in uncertainty quantification, sensitivity analysis, network science, etc. We estimate the diagonal with Monte Carlo estimators based on Rademacher random vectors and present probabilistic bounds for the normwise relative errors. Our bounds do not depend on the matrix dimension, and suggest that the accuracy of the estimators depends on the amount of diagonal dominance. This is corroborated by numerical experiments on synthetic test matrices. I will also present LanczosMC a method for estimating the diagonals of the inverse of a symmetric positive definite matrix that combines a preconditioned Lanczos low-rank approximation for the inverse with a Monte Carlo estimator. |

09:50 | Matrix-based Redistribution for Improved Coarse Level Scaling in Multilevel Monte Carlo PRESENTER: Hillary Fairbanks ABSTRACT. Multilevel Monte Carlo (MLMC) is a popular variance reduction approach with which to perform uncertainty quantification (UQ), as it accelerates the estimation of statistical quantities of interest by exploiting a hierarchy of coarse grid (level) discretizations of partial differential equations (PDEs). For this approach, it is necessary that the coarse levels provide guaranteed accuracy. If the coarse levels are obtained by algebraic means, such as in element agglomerated coarsening, such an accuracy (referred as to upscaling) is achievable. If the coarse levels utilize all available processors, the MLMC simulations are scalable if the respective linear solvers are scalable. In this work, we exploit an element agglomerated coarsening based on redistributing data to a reduced number of processors, which ensures both guaranteed accuracy (needed for the variance reduction in MLMC) and improves the solvers and MLMC scalability on coarse levels. The proposed redistribution utilizes only parallel sparse-matrix operations. This is completed during the AMG hierarchy construction to easily incorporate redistribution into MLMC. In this presentation, we will provide a high-level overview of this redistribution approach and provide numerical examples to show improvements in coarse grid scaling. |

10:15 | A Multilevel Approach to Stochastic Trace Estimation PRESENTER: Eric Hallman ABSTRACT. One common method for estimating a spectral sum f(A) is to apply Hutchinson's method to p(A), where p is a polynomial approximation to f over an interval containing the eigenvalues of A. The number of matrix-vector products required by a straightforward implementation is equal to the number of samples times the degree of the approximating polynomial. We show how to use multilevel Monte Carlo techniques to reduce this cost, specifically by decomposing the polynomial p(A) into two or more terms. When some terms are cheap to sample and others have low variance, sampling a well-chosen combination of the terms can outperform the standard single-level estimator. We present both theoretical results and numerical experiments demonstrating the effectiveness of the multilevel method. |

10:40 | Improved variants of the Hutch++ algorithm for trace estimation. PRESENTER: David Persson ABSTRACT. This work is concerned with two improved variants of the Hutch++ for estimating the trace of a square matrix A implicitly given through matrix-vector products. Hutch++ combines randomized low-rank approximation in a first phase with stochastic trace estimation in a second phase. In turn, Hutch++ only requires O(eps^(-1)) matrix-vector products to approximate the trace within a relative error eps with high probability. This compares favorably with the O(eps^(-2)) matrix-vector products needed when using stochastic trace estimation alone. In Hutch++, the number of matrix-vector products is fixed a priori and distributed in a prescribed fashion among the two phases. In this work, we derive an adaptive variant of Hutch++, which outputs an estimate of trace that is within some prescribed error tolerance with a prescribed success probability, while splitting the matrix-vector products in a near-optimal way among the two phases. For the special case of a symmetric positive semi-definite matrix, we present another variant of Hutch++, called Nyström++, which utilizes the so-called Nyström approximation and requires only one pass over the matrix as compared to two passes with Hutch++. We prove that the theoretical results on Hutch++ extend to Nyström++. Numerical experiments demonstrate the effectiveness of our two new algorithms. |

09:00 | Multiscale Finite Element Methods for an Elliptic Optimal Control Problem with Rough Coefficients PRESENTER: Jose Garay ABSTRACT. The solution of multiscale elliptic problems with non-separable scales and high contrast in the coefficients by standard Finite Element Methods (FEM) is typically prohibitively expensive since it requires the resolution of all characteristic lengths to produce an accurate solution. Numerical homogenization methods such as Localized Orthogonal Decomposition (LOD) methods provide access to feasible and reliable simulations of such multiscale problems. These methods are based on the idea of a generalized finite element space where the generalized basis functions are obtained by modifying standard coarse FEM basis functions to incorporate relevant microscopic information in a computationally feasible procedure. Using this enhanced basis one can solve a much smaller problem to produce an approximate solution whose accuracy is comparable to the solution obtained by the expensive standard FEM. Based on the LOD methodology, we investigate multiscale finite element methods for an elliptic distributed optimal control problem with rough coefficients. Numerical results obtained with our parallel implementation of the method illustrate our theoretical results. |

09:25 | ON PARAMETER CASCADE APPROACH TO SOLVING CONSTRAINED NONLINEAR LEAST SQUARES PROBLEMS WITH APPLICATIONS TO EPIDEMIOLOGY PRESENTER: Alexandra Smirnova ABSTRACT. We present a new iteratively regularized numerical algorithm for solving nonlinear constrained optimization problems. The main goal of this research is to mitigate significant computational challenges shared by traditional deterministic methods and methods based on Bayesian approach to nonlinear optimization, which both require numerical solutions to possibly complex systems of ordinary or partial differential equations at every step of the iterative process. Our proposed methodology builds upon all-at-once formulation, recently developed for constrained nonlinear least squares problems by B. Kaltenbacher and her co-authors (2018), the generalized profiling methods by J. Ramsay, G. Hooker, D. Campbell, and J. Cao for estimating parameters in nonlinear ordinary differential equations (2016), and the so-called traditional route for solving nonlinear ill-posed problems, pioneered by A. Bakushinsky (1991). Similar to all-at-once approach, the novel parameter cascade method no longer requires numerical solutions to the constraint equation(s) in order to find the state variable as a function of the current structural parameter. That reduces the cost of a quasi-Newton step and incorporates extra layer of stability in the proposed scheme. At the same time, the parameter-cascade structure of the new method avoids the difficulty of dealing with large solution spaces resulting from all-at-once make-up, which inevitably gives rise to oversized Jacobian and Hessian approximations. Therefore, the new algorithm has the potential to save time and storage, which is critical when multiple runs of the iterative scheme are carried out for uncertainty quantification. Finally, the new procedure is rooted in iterative regularization, which allows to incorporate a broad range of filters, discretization schemes, and nonlinear optimization algorithms leading to enhanced numerical efficiency as compared to generalized smoothing approach by J. Ramsay, G. Hooker, D. Campbell, and J. Cao. To assess numerical efficiency of the proposed algorithm, a parameter estimation inverse problem in epidemiology is considered. References 1. G. Hooker, J. O. Ramsay, L. Xia. [2016] CollocInfer: Collocation Inference in Differential Equation Models, Journal of Statistical Software, 75(2), 1–52 2. B. Kaltenbacher, B. [2018] Minimization based formulations of inverse problems and their regularization, SIAM J. Optim. 28(1), 620--645 3. A. Smirnova, A. Bakushinsky, [2020] On Iteratively Regularized Predictor-Corrector Algorithm for Parameter Identification, Inverse Problems, 36, 124008 |

09:50 | Iterative methods for interior point algorithms in the $L^2$ Optimal Transport Problem PRESENTER: Enrico Facca ABSTRACT. The Optimal Transport theory studies the problem of moving one non-negative density into another, minimizing the total cost of displacement. When the transport cost per unit mass is the squared distance, the problem can be rewritten in the so-called Benamou-Brenier formulation, a PDE-constrained optimization problem. This formulation provides a natural way to interpolate between probability densities, with applications in image processing, inverse problems, and generation of adaptive grids. The Benamou-Brenier problem has been solved for a long time using first-order methods, that typically require a large number of iterations to obtain accurate solutions. In "Computation of Optimal Transport with Finite Volumes" (A. Natale, G. Todeschi, ESAIM: Mathematical Modelling and Numerical Analysis, 55, 2021) the authors proposed a second-order approach based on interior point methods to solve the finite-dimensional counterpart of the Benamou-Brenier problem obtained using Finite Volumes. This approach has been shown to converge to the optimal solution with a fixed number of interior point steps, independently on the size of the discretized problem. However, these results were obtained using direct linear solver for solving the saddle point systems involved in Newton iteration used by the interior point method. This poses serious limitations to the application of the proposed method to large problems, thus iterative methods must be adopted. We present the difficulties arising in the design of efficient preconditioners for these saddle point linear systems, and some solutions we devised to cope with them. |

10:15 | Solving Optimal Control Problems Containing Uncertain Coefficients with Stochastic Discontinuous Galerkin Methods PRESENTER: Pelin Çiloğlu ABSTRACT. In this talk, we investigate a numerical analysis of a robust deterministic optimal control problem subject to a convection diffusion equation containing uncertain inputs. Stochastic discontinuous Galerkin method provides an efficient alternative to sampling methods for the numerical solution of the underlying optimization problems. In order to compute the stochastic discontinuous Galerkin solution for a given problem, one needs to solve a large coupled system of linear equations in the form of a sum of tensor product matrices. We address this issue by using the low-rank variant of GMRES method, which reduces both the computational complexity and the memory requirements by exploiting a Kronecker–product structure of system matrices. The efficiency of the proposed methodology is illustrated by the benchmark problems with and without control constraints. |

10:40 | χ2 Test for TV regularization parameter selection ABSTRACT. Total Variation (TV) regularization has shown to be effective in recovering parameters with edges or discontinuities. As with any regularization method, regularization parameter selection is important. For example, the discrepancy principle is a classical approach whereby data are fit to a specified tolerance. The tolerance can be identified by noting the least squares data fit follows a χ^2 distribution. However, this approach fails when the number of parameters is greater than or equal to the number of data. Heuristics are then employed to identify the tolerance in the discrepancy principle ,and this leads to oversmoothing. In this work we identify a χ^2 test for TV regularization parameter selection. We prove that the degrees of freedom in the TV regularized residual is the number of data and this is used to identify the appropriate tolerance in a discrepancy principle approach. The importance of this work lies in the fact that the χ^2 test introduced here for TV automates the choice of regularization parameter selection and can straightforwardly be incorporated into any TV algorithm. Results are given for three test images and compared to results using the discrepancy principle and MAP estimates. |

11:25 | A fully algebraic spectral coarse space for overlapping Schwarz methods PRESENTER: Alexander Heinlein ABSTRACT. Discretizing partial differential equations often results in sparse systems of linear equations. High spatial resolutions lead to large systems, which can be solved efficiently using iterative methods. A suitable class of solvers are Krylov methods preconditioned by domain decomposition preconditioners, which are scalable and robust for a wide range of problems. Unfortunately, highly heterogeneous problems arising, for instance, in the simulation of composite materials or porous media generally lead to unfavorable distributions of the eigenvalues of the system matrix that cause slow convergence for many solvers, including classical domain decomposition preconditioners. In order to retain robustness of domain decomposition methods, the coarse space can be enriched by additional coarse basis functions computed from eigenmodes of local generalized eigenvalue problems, leading to so-called spectral or adaptive coarse spaces. The development of spectral coarse spaces for domain decomposition methods has been a very active topic within the last decade. However, until recently, the algebraic construction of robust spectral coarse spaces, that is, using only the fully assembled system matrix without additional Neumann matrices or geometrical information, has still been on open problem. This talk deals with a specific class of adaptive coarse spaces for overlapping Schwarz methods which are based on a partition of the interface of the corresponding nonoverlapping domain decomposition. An algebraic and robust spectral coarse space is then constructed by solving two eigenvalue problems on each edge of a two-dimensional domain decomposition. One of them is based on optimal local approximation spaces that are also successfully employed in the construction of multiscale discretizations. The resulting condition number bound is independent of the contrast of the coefficient function, indicating the robustness of the method. In order investigate the robustness of this new method numerically, numerical results for different coefficient distributions with large jumps are presented, including typical examples with channels, random distributions, and coefficient distributions generated from the SPE10 benchmark. Furthermore, randomized eigensolvers are employed to improve the efficiency of solving the eigenvalue problems. |

11:50 | Polynomial Reduction with Full Domain Decomposition Preconditioner for Spectral Element Poisson Solvers PRESENTER: Pedro Bello-Maldonado ABSTRACT. Rapid solution of the Poisson equation is of primary importance in many scientific computing applications. In fluid mechanics, a Poisson equation for the pressure enforces the incompressibility of the flow at each timestep. In the Poisson-Nernst-Planck equations, electroneutrality is enforced by the potential, which is also governed by a Poisson problem. In these mathematical models, the constraints represent the fastest phenomena and the Poisson evaluation is thus the stiffest, and computationally slowest, substep in time-advancement of the simulation. For three-dimensional problems on unstructured meshes, efficient solution of the Poisson equation requires preconditioned iterative solvers. Because these problems are unsteady, there is a particular emphasis on minimizing solver costs, as opposed to setup costs, which are typically amortized over hundreds or thousands of timesteps. In large-scale parallel environments, minimizing communication is central to reducing the iterative solver overhead, particularly on GPU-based systems where relative inter-node communication latency is high. To address the performance concerns, the authors explore a low-communication multilevel Poisson solver based on the range-decomposition ideas of Bank [2], Mitchell [3], and Appelhans [1] et al. for a parallel solution of the Poisson problem discretized using the spectral element method (SEM) [6]. The approach begins with a standard domain-decomposition partitioning of the discretized Poisson equation distributed across MPI ranks, with one GPU per rank. An outer non-restarted GMRES solver drives the iterative solution and it is preconditioned with a multilevel algorithm, denoted as polynomial reduction with full domain decomposition (PR+FDD). For the preconditioner, each MPI rank retains an image of the residual on the full domain, with only the local degrees-of-freedom (DOFs) being fully represented on the given rank. The remaining DOFs are represented on a coarsened composite grid in which the polynomial order of the elements adjacent to the local partition is successively reduced, layer-by-layer, to unity. Outside of this $p$-type reduction layer, geometric coarsening is applied all the way to the domain boundary to complete a composite grid that is representative of the fine, global grid. During the preconditioning step, the residual is restricted to the different levels of the coarsening tree and communicated so that each processor can solve its local problem independently. The local Poisson composite problems are solved with an inner GMRES iterative solver preconditioned with algebraic multigrid (AMG) and a low-order collocated mesh as in [3]. In the spirit of the restrictive additive Schwarz (RAS) approach of Sarkis and Cai [4], the solutions to the composite problems are discarded at the end of the preconditioning step and only the processor/local DOFs are retained for the preconditioner output. All communication is thus focused into a single off-GPU phase, thereby reducing the impact of inter-GPU latency. This study explores the key parameters of the algorithm (e.g., coarsening strategies and intermediate-iteration tolerances) for a variety of problems over a range of processor counts using the NVIDIA V100s on the OLCF Summit supercomputer. Weak and strong scaling studies are presented and results are compared with other SEM preconditioners such as low-order preconditioning, hybrid-Schwarz-multigrid, and Chebyshev-Schwarz methods. On structured domains, our method achieves a solve time of 0.39 s with 8 billion DOFs on 4096 GPUS, surpassing the 0.54 s of geometric multigrid (GMG) for the same problem. On unstructured domains, we demonstrate the effectiveness of the preconditioner in reducing the number of outer iterations, achieving a 4-8 times reduction compared to low-order preconditioning on different computational domains. References: [1] D. J. Appelhans, T. Manteuffel, S. McCormick, and J. Ruge, "A low-communication, parallel algorithm for solving PDEs based on range decomposition", Numerical Linear Algebra with Applications, 24 (2017), p. e2041. [2] R. Bank, R. Falgout, T. Jones, T. A. Manteuffel, S. F. McCormick, and J. W. Ruge, "Algebraic Multigrid Domain and Range Decomposition (AMG-DD/AMG-RD)", SIAM Journal on Scientific Computing, 37 (2015), pp. S113-S136. [3] P. D. Bello-Maldonado and P. F. Fischer, "Scalable Low-Order Finite Element Preconditioners for High-Order Spectral Element Poisson Solvers", SIAM Journal on Scientific Computing, 41 (2019), pp. S2-S18. [4] X. Cai and M. Sarkis, "A Restricted Additive Schwarz Preconditioner for General Sparse Linear Systems", SIAM Journal on Scientific Computing, 21 (1999), pp. 792-797. [5] W. Mitchell, R. Strzodka, R. D. Falgout, and S. F. McCormick, "Parallel Performance of Algebraic Multigrid Domain Decomposition (AMG-DD)", ArXiv, abs/1906.10575 (2019). [6] A. T. Patera, "A spectral element method for fluid dynamics: Laminar flow in a channel expansion", Journal of Computational Physics, 54 (1984), pp. 468-488. |

12:15 | Restricted Additive Schwarz Preconditioned Exact Newton with Algebraic Coarse Space PRESENTER: Chaiyod Kamthorncharoen ABSTRACT. Restricted Additive Schwarz Preconditioned Exact Newton method (RASPEN) is a nonlinear preconditioning of Newton's method for solving the nonlinear algebraic systems of equations which often result from the discretisation of partial differential equations (PDEs). The idea of RASPEN is to transform the original nonlinear system to an equivalent system by the use of local subdomain solutions. The transformed system is supposed to be easier to solve by Newton's method. A coarse-level correction is exploited to overcome the scalability concern since one-level schemes have an issue with scalability when increasing the number of subdomains. In this study, we introduce a coarse space which can be constructed in the purely algebraic manner and does not need a coarse mesh. It is based on Nicolaides coarse space with some extensions and the setup of the coarse problem can be done in parallel. We apply RASPEN on different discretisation schemes like Finite Element and discontinuous Galerkin, on stationary and instationary problems, on scalar-value problem and complex system of PDEs. The numerical experiments show the flexibility of RASPEN and the effectiveness of the two-level approach. |

12:40 | BDDC Preconditioned P-Multigrid for High-Order Finite Elements PRESENTER: Jeremy L Thompson ABSTRACT. High-order matrix-free finite element operators offer superior performance on modern high performance computing hardware when compared to assembled sparse matrices, both with respect to the number of floating point operations needed for operator evaluation and the memory transfer needed for a matrix-vector product. However, high-order matrix-free operators require iterative solvers, such as Krylov subspace methods and these iterative solvers converge slowly for the ill-conditioned operators that come from high-order discretizations. Specifically, $p$-multigrid and domain decomposition methods are particularly well suited preconditioners for problems on high-order, unstructured meshes. Local Fourier Analysis (LFA) of these preconditioners analyzes the frequency modes found in the iterate error following the application of these methods and can provide sharp convergence estimates and parameter tuning while only requiring computation on a single representative element or macro-element patch. The performance of $p$-multigrid in parallel is partially determined by the number of multigrid levels. However, aggressive coarsening is not well supported by traditional polynomial smoothers such as the Chebyshev semi-iterative method; aggressive coarsening leaves the smoother responsible for wide range of error frequency modes, which causes degraded smoothing unless a high order polynomial smoother is used. Dirichlet Balancing Domain Decomposition by Constraints (BDDC) enables aggressive coarsening at a computation and communication cost similar to low-degree polynomial smoothers. We provide LFA of $p$-multigrid with Dirichlet BDDC smoothing via LFAToolkit, a Julia package for LFA of finite element operators and preconditioners. This analysis demonstrates the suitability of this combination for preconditioning high-order matrix-free finite element discretizations. Furthermore, we discuss implementation concerns for a matrix-free implementation of Dirichlet BDDC with Fast Diagonalization Method based approximate subdomain solvers in libCEED, a performance portable library for matrix-free implementations of finite element operators. |

13:05 | Adaptive Nonlinear Domain Decomposition Methods PRESENTER: Martin Lanser ABSTRACT. In this talk, different nonlinear domain decomposition methods are applied to nonlinear problems with highly-heterogeneous coefficient functions with jumps. In order to obtain a robust nonlinear domain decomposition method with respect to nonlinear as well as linear convergence, adaptive coarse spaces are employed. First, as an example for a nonlinearly left-preconditioned domain decomposition method, the two-level restricted nonlinear Schwarz method H1-RASPEN (Hybrid Restricted Additive Schwarz Preconditioned Exact Newton) is combined with an adaptive generalized Dryja-Smith-Widlund (GDSW) coarse space. Second, as an example for a nonlinearly right-preconditioned domain decomposition method, a nonlinear FETI-DP (Finite Element Tearing and Interconnecting - Dual Primal) method is equipped with an edge-based adaptive coarse space. Both approaches are compared with the nonlinear domain decomposition methods with classical coarse spaces as well as with the Newton-Krylov methods with adaptive coarse spaces. For two-dimensional problems with different spatial coefficient distributions, it can be observed that the best linear and nonlinear convergence can only be obtained when combining the nonlinear domain decomposition methods with adaptive coarse spaces. |

11:25 | Newton's Method in Mixed-Precision ABSTRACT. We investigate the use of reduced precision arithmetic to solve the linear equation for the Newton step. If one neglects the backward error in the linear solve, then well-known convergence theory implies that using single precision in the linear solve has very little negative effect on the nonlinear convergence rate. However, if one considers the effects of backward error, then the usual textbook estimates are very pessimistic and even the state-of-the-art estimates using probabilistic rounding analysis do not fully conform to experiments. We report on experiments with a specific example. We store and factor Jacobians in double, single, and half precision. In the single precision case we observe that the convergence rates for the nonlinear iteration do not degrade as the dimension increases and that the nonlinear iteration statistics are essentially identical to the double precision computation. In half precision we see that the nonlinear convergence rates, while poor, do not degrade as the dimension increases. |

11:50 | Multistage Mixed Precision Iterative Refinement PRESENTER: Eda Oktay ABSTRACT. Low precision arithmetic, in particular half precision (16-bit) floating point arithmetic, is now available in commercial hardware and can offer significant savings in computation and communication costs with proportional savings in energy. Motivated by this, there has been a renewed interest in mixed precision iterative refinement schemes for solving linear systems, and new variants of GMRES-based iterative refinement have been developed. Each particular variant with a given combination of precisions leads to different condition number-based constraints for convergence of the backward and forward errors, and each has different performance costs. The constraints for convergence given in the literature are, as an artifact of the analyses, often overly strict in practice, and thus could lead a user to select a more expensive variant when a less expensive one would have sufficed. In this talk, we give an overview of existing approaches to mixed precision iterative refinement and present a new multistage mixed precision iterative refinement solver which aims to combine existing mixed precision approaches to balance performance and accuracy and improve usability. For a user-specified initial combination of precisions, the algorithm begins with the least expensive approach and convergence is monitored via inexpensive computations with quantities produced during the iteration. If slow convergence or divergence is detected using particular stopping criteria, the algorithm switches to use a more expensive, but more reliable variant. A novel aspect of our approach is that, unlike existing implementations, our algorithm first attempts to use “stronger” GMRES-based solvers for the solution update before resorting to increasing the precision(s). In some scenarios, this can avoid the need to refactorize the matrix in higher precision. |

12:15 | Mixed-precision interior-point method for linear programming problems PRESENTER: Jeffrey Cornelis ABSTRACT. Primal-dual interior-point methods are efficient algorithms for the solution of linear programming problems. These methods require the solution of a linear system of equations in each iteration. Direct methods, such as the Cholesky factorization, are currently used in most state-of-the-art software. Iterative solution methods, such as the Conjugate Gradient method, have not yet found widespread use in practice, since the severe ill-conditioning of the matrices makes them quite inefficient without specialized preconditioners. The ill-conditioning can be alleviated by adding regularization terms to the linear programming problem. In addition, it is also possible to allow an inexact solution of the linear systems of equations in each iteration. In this talk we exploit the inexactness by using a mixed-precision solver for the linear system of equations. More specifically, we perform a Cholesky factorization in IEEE single precision and use it as a preconditioner for the Conjugate Gradient method, implemented in IEEE double precision. This approach can be seen as some kind of hybrid between a direct method and an iterative method. Numerical experiments illustrate the benefits of this approach applied to linear programming problems with a dense constraint matrix. Since floating point operations in single precision are roughly speaking about twice as fast than double precision, this then leads to a reduction in computational time compared to simply using a Cholesky factorization in double precision. |

12:40 | Quasi-Newton: combining Multi-Secant and Symmetry PRESENTER: Nicolas Boutet ABSTRACT. The known quasi-Newton methods differ in the properties of the estimation of the Jacobian matrix. In this paper, we discuss two of these properties which are widely used in the literature: the symmetry found in most of the methods applied in optimization and the multi-secant property which proves to be particularly efficient in root-finding problems, in particular for interaction problems. The combination of both properties has however been explored only to a very limited extent. We explain why the combination of those properties is mathematically impossible and explore different strategies to approach this combination. |

11:25 | A Layer-Parallel Multi-Level Optimization Approach to Training Deep Neural Networks PRESENTER: Eric Cyr ABSTRACT. Deep neural networks are a powerful machine learning tool with the capacity to “learn” complex nonlinear relationships described by large data sets. Despite their success training these models remains a challenging and computationally intensive undertaking. In recent work, multigrid-in-time (MGRIT) was applied to accelerate the forward and backward propagation for Neural ODEs, and a certain class of recurrent neural networks. This achieved good parallel speedups; however, algorithmically the performance is identical. Simultaneously, other groups have considered more holistic changes to the training strategy using a multilevel optimization strategy known as MGOPT. This uses a hierarchy identical to layer-parallel, but accelerates training algorithmically by choosing search directions in a hierarchical fashion. In this talk we propose the combining layer-parallel with MGOPT to achieve algorithmic and parallel acceleration. We demonstrate our approach on several benchmark problems in machine learning, and explore the applicability in the context of physics-informed neural networks. |

11:50 | Global Convergence and Stability of Stochastic Gradient Descent ABSTRACT. Stochastic gradient descent (SGD) is widely deployed in a number of different disciplines, often on non-convex problems with complicated noise models. However, SGD's existing convergence theory often does not apply. In this talk, we will establish the need for a convergence theory under broader assumptions with some simple examples. We will then state a global convergence result for SGD under these broad assumptions. Then, we will discuss the issue of stability, which addresses what happens when SGD's iterates diverge. |

12:15 | Sampled Limited-Memory Training (slimTrain) of Separable Deep Neural Networks PRESENTER: Elizabeth Newman ABSTRACT. Deep neural network (DNNs) has provided flexible models with excellent approximation properties, especially in high-dimensions. However, realizing the theoretical approximation properties in practice requires training the DNN weights, which can be challenging. The training problem typically is posed as a stochastic optimization problem with respect to the DNN weights. With millions of weights, a non-convex and non-smooth objective function, and many hyperparameters to tune, solving the training problem well requires many iterations, many trials, and significant time and computational resources. In this talk, we exploit the separability of commonly-used DNN architectures to simplify the training process. We call a DNN separable if the weights of the final layer of the DNN are applied linearly, which includes most state-of-the-art networks. We will focus on the stochastic approximation setting and employ a powerful iterative sampling approach, called sampled limited-memory Tikhonov, which notably incorporates automatic regularization parameter selection methods. We will demonstrate the efficacy of slimTrain, our DNN training algorithm, using numerical examples. |

12:40 | Efficient estimation of the intrinsic dimension via Ritz values and orthogonal polynomials PRESENTER: Kadir Özçoban ABSTRACT. This study aims to design a highly efficient and robust intrinsic dimension estimation approach for dimensionality reduction methods widely used in machine learning. Due to their nature, real life data have a complex and non-linear structure. These non-linearities and the high number of features can usually cause problems such as the empty-space phenomenon and the well-known curse of dimensionality. Because of the curse of dimensionality (and related overfitting issues), the accuracy of classical machine learning algorithms (such as classification, clustering, and regression) can dramatically decrease. Finding the optimal representation of the dataset in a lower-dimensional space offers an applicable mechanism for the solution of improving the success of machine learning tasks. This procedure is called dimensionality reduction. However, estimation of the required data dimension for the optimal representation (intrinsic dimension) can be a very costly operation, particularly if one deals with big data. In this study, we first implicitly determine the Ritz values of the data covariance matrix by running the Conjugate Gradient for Least Squares algorithm for a few iterations. This process approximately constructs the eigenvalue search intervals of the data covariance matrix. Then, we run the Chebyshev filtering method for each interval to estimate the number of eigenvalues in that interval. The total variance or equivalently the sum of the eigenvalues of the covariance matrix can be estimated by utilizing some well-known randomized techniques. Therefore, we can now efficiently estimate the intrinsic dimension required for a given target variance without solving the related eigenvalue problem by using the estimated eigenvalue counts in each interval and trace of the data covariance matrix. Our experiments show that the suggested method works well for linear data. In order to adapt it to non-linear data, the proposed approach should be applied locally. That means, firstly, data should be separated into locally linear clusters, and then the method should be applied to each cluster. This improvement also strengthens the parallelization possibilities of our approach. In future studies, we plan to adapt the proposed approach for the generalized eigenvalue problem, which will allow us to use it in a broader perspective for dimensionality reduction studies such as in Manifold Learning. This work is supported by TÜBİTAK (The Scientific and Technological Research Council of Turkey) under grant number 120E281. |