View: session overviewtalk overview
| 08:00 | Multilevel training methods for large-scale transformer architectures PRESENTER: Graham Harper ABSTRACT. The foundation of most large-scale artificial intelligence products is the transformer architecture, encompassing both generative image and chat models. Transformers utilize attention, which is a mechanism of projecting input data onto subspaces of bases which are learned through the training process. The outputs of attention blocks may then be combined in many different ways, leading to many different variations within the transformer architecture; however, training large-scale transformers is computationally expensive due to the modern ``bigger is better'' paradigm. We explore modifications for transformers that incorporate multilevel algorithms within the training process which capture a broader spectrum of information to improve computational efficiency and model accuracy. We present general theory and algorithms alongside results focusing on vision transformers. |
| 08:25 | Hierarchical Graph Networks for Scalable Learning on Graphs PRESENTER: Nicolas Nytko ABSTRACT. Graph neural networks have emerged as powerful tools for data-driven learning on semi-structured inputs, such as connectivity graphs derived from the finite element discretization of elliptic PDEs. However, their performance often stalls on large-diameter graphs because their local updates tend to behave similarly to stationary iterative methods such as Jacobi iteration, leading to slow propagation of global information. To address this, we propose a hierarchical cycling architecture inspired by the multigrid V-cycle. This allows the model to efficiently capture and propagate information across all spatial scales, allowing us to effectively learn global properties of the graph. Moreover, we provide numerical results that demonstrate that our proposed method maintains scalability and superior performance for large-scale problems, especially when compared to current graph neural network architectures. |
| 08:50 | Multi-Level Patch Diffusion Models ABSTRACT. Diffusion models are gaining popularity in scientific domains beyond traditional image generation, including learning stochastic differential equations and efficiently sampling high-dimensional distributions. While latent-space diffusion excels in image synthesis, many scientific applications require high-resolution generation directly in physical space to maintain accuracy and avoid the challenges of training encoders. However, direct physical-space diffusion is computationally and memory intensive, especially for large-scale or 3D problems. We propose a patch-based diffusion framework inspired by multilevel methods for numerical PDEs. Unlike standard patch-based approaches that process patches independently and can lose global statistical coherence, we introduce a two-level architecture that couples patches through a coarse-grid Vision Transformer (ViT). Each patch is processed by a shared U-Net encoder–decoder, while the ViT enables cross-patch information exchange in a compressed latent representation. This design preserves global statistical properties while retaining the scalability benefits of domain decomposition. Although the transformer introduces additional memory overhead, patch decomposition supports parallelism and scaling to large domains. The shared U-Net, together with positional encoding, enforces consistent local dynamics across patches. We demonstrate the approach on image benchmarks and scientific datasets (e.g., turbulent flows), where maintaining multiscale statistics and coherent structures is critical, opening new avenues for scalable data-driven modeling in computational science and engineering. |
| 09:15 | Multilevel Training for Kolmogorov Arnold Networks PRESENTER: Eric Cyr ABSTRACT. Algorithmic speedup of training common neural architectures is made difficult by the lack of structure guaranteed by the function compositions inherent to such networks. In contrast to multilayer perceptrons (MLPs), Kolmogorov-Arnold networks (KANs) provide more structure by expanding learned activations in a specified basis. This paper exploits this structure to develop practical algorithms and theoretical insights, yielding training speedup via multilevel training for KANs. To do so, we first establish an equivalence between KANs with spline basis functions and multichannel MLPs with power ReLU activations through a linear change of basis. We then analyze how this change of basis affects the geometry of gradient-based optimization with respect to spline knots. The KANs change-of-basis motivates a multilevel training approach, where we train a sequence of KANs naturally defined through a uniform refinement of spline knots with analytic geometric interpolation operators between models. The interpolation scheme enables a ``properly nested hierarchy'' of architectures, ensuring that interpolation to a fine model preserves the progress made on coarse models, while the compact support of spline basis functions ensures complementary optimization on subsequent levels. Numerical experiments demonstrate that our multilevel training approach can achieve orders of magnitude improvement in accuracy over conventional methods to train comparable KANs or MLPs, particularly for physics informed neural networks. Finally, this work demonstrates how principled design of neural networks can lead to exploitable structure, and in this case, multilevel algorithms that can dramatically improve training performance. |
| 08:00 | An incidence-based edge-averaged method for the convection-diffusion problems on polygonal meshes. PRESENTER: Eoghan O'Keefe ABSTRACT. The convection-diffusion partial differential equation models transport processes where advective flow is combined with diffusive effects. In convection-dominated regimes, particularly at high Péclet numbers, shocks and numerical instabilities arise that challenge standard discretization techniques such as the Galerkin finite-element method. The edge-averaged finite-element (EAFE) method is a monotone method that resolves these issues on simplicial meshes. In this work, we extend the EAFE method into the incidence-based edge averaged method utilizing the incidence matrix defined on each element for polygonal meshes. The resultant scheme is still monotone under certain conditions, and we demonstrate effectiveness via numerical experiments. |
| 08:25 | Adaptive Multigrid Finite State Projection for the Reaction-Diffusion Master Equation PRESENTER: Aditya Dendukuri ABSTRACT. We develop an adaptive multigrid method for the Finite State Projection (FSP) of the Reaction-Diffusion Master Equation (RDME), based on a unified framework for spatial aggregation in continuous-time Markov chains. The RDME state space has two distinct hierarchical dimensions: molecule counts (the conventional CME dimension) and spatial position (voxel indices). We show that coarsening the spatial dimension by merging adjacent voxels corresponds to a Kemeny–Snell aggregation in the voxel-index dimension of the Markov chain. This perspective unifies spatial coarsening with quasi-steady-state (QSS) aggregation of molecular-count states, as both arise from collapsing fast-mixing regions of the Markov chain into macro-states. Within this framework, the stationary distribution inside each aggregated region defines the prolongation operator and is given in closed form by a binomial or multinomial distribution induced by fast intra-voxel diffusion. We prove that the resulting coarse generator is exact for linear reaction networks and characterize the closure error for nonlinear systems, showing that it vanishes in the high-copy-number limit. This establishes a direct connection between stochastic reaction kinetics and multilevel operator approximation and supports the recursive construction of spatial hierarchies. The multigrid V-cycle is formulated by splitting the generator into fast and slow components. Fast intra-voxel diffusion acts as a smoothing operator, while the slow component, consisting of inter-voxel diffusion and reactions, is advanced on coarser grids where stiffness is reduced. This decouples fast and slow processes and allows time steps to be chosen according to the slower reaction dynamics rather than fine-scale diffusion limits. A key component of the method is an adaptive spatial refinement strategy that resolves the failure modes of naive flux-based criteria. By combining total spatial flux with a per-molecule flux measure, we distinguish inactive regions from structural bottlenecks where few molecules mediate critical transitions. This preserves high-transport interfaces at fine resolution while allowing aggressive coarsening elsewhere and complements existing FSP state-space adaptivity. Numerical experiments demonstrate the effectiveness of the approach. For a one-dimensional birth–death–diffusion system, the method accurately recovers spatial probability profiles while substantially reducing the effective state space. For the spatial Schlögl model, the solver captures front propagation and metastable dynamics by automatically refining near interfaces and coarsening in homogeneous regions. Across stiff regimes, the method enables time steps orders of magnitude larger than those permitted by fine-grid diffusion constraints, yielding substantial speedups without sacrificing FSP error control. More broadly, this work shows how fast-mixing substructures in continuous-time Markov processes can be systematically eliminated to produce reduced models that preserve long-time dynamics while improving computational efficiency. |
| 08:50 | Physics-Based Clustering for the Construction of Multiscale Solvers in Heterogeneous and Multiscale Systems PRESENTER: Maria Vasilyeva ABSTRACT. We present physics-based multiscale clustering algorithms for the reduction and upscaling of heterogeneous and anisotropic systems. The proposed approach partitions the computational domain into balanced subdomains and employs spectral, physics-informed clustering techniques based on local generalized eigenproblems to identify dominant multiscale features. The resulting coarse spaces are constructed using energy-minimizing strategies, including harmonic extensions and constrained formulations inspired by nonlocal multiscale methods. The framework enables efficient multiscale solver construction by identifying low-dimensional, physics-preserving representations. The clustering procedure produces adaptive coarse spaces for iterative and multilevel methods, improving convergence and efficiency. It also supports hierarchical clustering, yielding multilevel representations that enable scalable and robust solvers for complex multiscale systems. |
| 09:15 | Fast MBIR via Fourier Reformulation and Multilevel Methods PRESENTER: Dinesh Kumar ABSTRACT. We present a multi-strategy acceleration framework for Model-Based Iterative Reconstruction (MBIR) in tomographic imaging that enables high-fidelity reconstructions at scales and speeds compatible with modern experimental workflows. While MBIR provides superior image quality through physics-based modeling and regularization, its computational cost has historically limited its applicability, particularly for large-scale and time-resolved datasets. Our approach combines four complementary strategies that exploit both mathematical structure and modern hardware. First, we reformulate the forward and adjoint operators using non-uniform fast Fourier transforms (NUFFTs) and leverage the Toeplitz structure of the normal operator to replace repeated projection operations with efficient Fourier-domain convolutions, reducing per-iteration complexity. Second, we introduce a hierarchical multi-resolution scheme in which the reconstruction is solved on progressively finer grids, significantly reducing total iteration counts while preserving accuracy. Third, we employ a ramp-filtered backprojection as an informed initialization, improving conditioning and accelerating convergence compared to naïve starting points. Finally, we develop a distributed-memory implementation using MPI that partitions data across nodes and GPUs, enabling scalable performance on modern high-performance computing systems. In addition, we plan to leverage this Fourier-domain reconstruction framework to develop methods for X-ray magnetic circular dichroism (XMCD) vector tomography, extending these computational gains to emerging vector-field imaging applications. Numerical experiments demonstrate that these strategies provide both constant-factor reductions in per-iteration cost and substantial decreases in total iterations to convergence. The combined framework achieves near-linear strong scaling across multiple nodes and enables reconstruction of large volumetric datasets previously infeasible with conventional MBIR implementations. This work illustrates how integrating operator reformulation, hierarchical optimization, and parallel computing can fundamentally shift the computational efficiency of iterative methods, making MBIR practical for high-throughput, undersampled, and real-time tomographic applications. |
| 10:20 | Transforming Smoothers for Elliptic Interface Problems with Jumping Coefficients PRESENTER: Najwa Alshehri ABSTRACT. We consider elliptic interface problems in a fictitious domain formulation with a distributed Lagrange multiplier. The resulting linear system is difficult to solve efficiently without an appropriate preconditioner. In this talk, we present transforming smoothers as an effective preconditioning technique. The approach is based on a suitable transformation of the system that improves its algebraic structure and leads to robust solver performance. In particular, we show that an appropriate choice of preconditioning yields a system with uniformly bounded conditioning independent of mesh refinement, while alternative formulations can lead to significantly degraded performance. Numerical results confirm stable behavior and efficient convergence. |
| 10:45 | Multigrid Reduction Preconditioning for Fully Implicit Magnetohydrodynamics PRESENTER: Victor Magri ABSTRACT. We present a multigrid reduction (MGR) preconditioner for the fully implicit AC-GLM-MHD Jacobian arising from the coupled velocity, pressure, magnetic field, and magnetic potential variables. This formulation integrates artificial compressibility (AC) to couple pressure and velocity via a pseudo-compressibility term and generalized Lagrange multipliers (GLM) to provide a divergence-cleaning mechanism that controls violations of the solenoidal constraint on the magnetic field. The method combines a pressure Schur complement with a GLM-consistent magnetic correction so that both artificial-compressibility stiffness and divergence-cleaning stiffness are treated directly in the preconditioner. The reduced formulation retains the coupled magnetic/GLM block, incorporates pressure stabilization through a scalable pressure operator, and augments the magnetic solve with the correction induced by eliminating the GLM scalar. This design preserves the dominant physical couplings while being cost-effective. The resulting preconditioner is intended for right-preconditioned FGMRES and is designed to remain robust as CFL, artificial compressibility, GLM wave speed, and damping vary across stiff regimes. We assess its robustness and efficiency for large, sparse, non-symmetric Newton systems arising in implicit AC-GLM-MHD simulation, with emphasis on coupled liquid-metal MHD applications. |
| 11:10 | Efficient Multigrid Solvers for Time-Dependent Rayleigh-Bénard Convection PRESENTER: Ahsan Ali ABSTRACT. In this work, we consider the efficient numerical solution of the time-dependent Rayleigh-Bénard convection problem described by the incompressible Boussinesq equations. The system couples the Navier-Stokes equations with a convection-diffusion equation for temperature through a buoyancy term, leading to a nonlinear and strongly coupled problem. Fully implicit Runge-Kutta methods like Radau IIA provide an L-stable family of high-order methods, but these lead to large nonlinear systems at each time step. To address the stage-coupled Jacobians obtained from Newton linearization, we consider two approaches based on preconditioned flexible GMRES: an augmented Lagrangian preconditioner for the Navier-Stokes block and a monolithic multigrid method with Vanka-type relaxation. Numerical results demonstrate robust and efficient performance across a range of physical regimes, including low-Prandtl-number flows, highlighting the effectiveness of combining high-order implicit time integration with tailored multigrid solvers. |
| 11:35 | Quantum optimal preconditioning via oracle-free block-encoding of expanded systems PRESENTER: Seulip Lee ABSTRACT. We study quantum algorithms for solving large-scale linear systems arising from finite-element discretizations of high-dimensional elliptic partial differential equations. These systems are typically sparse but become increasingly ill-conditioned as the problem size grows, leading to high computational costs for both classical and quantum solvers due to their dependence on the condition number. This observation highlights the importance of achieving optimal preconditioning, particularly in the quantum setting, where a uniformly bounded condition number enables exponential speedups. To address this challenge, we introduce an expanded system inspired by multigrid methods, in which an expanding operator yields optimal preconditioning under suitable assumptions on the mesh and coefficients. In contrast to standard oracle-based quantum approaches, we develop an oracle-free block-encoding of the expanding operator using only elementary quantum gates, enabling an efficient construction of a block-encoding for the expanded system. Combined with quantum linear system algorithms, this approach removes dependence on the condition number and achieves polylogarithmic complexity in the system size, potentially yielding an exponential speedup over classical methods in high-dimensional regimes. |
| 15:20 | Accelerating Algebraic Multigrid with Learned Sparse Corrections PRESENTER: Eran Treister ABSTRACT. The scalable solution of large sparse linear systems is a bottleneck in scientific computing and graph analysis. While algebraic multigrid (AMG) offers optimal linear scaling, its performance is severely constrained by the trade-off between the sparsity and convergence quality of coarse-grid operators. Classical AMG heuristics struggle to balance these objectives, often sacrificing stability for sparsity. We propose RAPNet, a graph neural network (GNN) framework that resolves this trade-off by learning to generate sparse, robust coarse operators directly from the sparse algebraic system. Key to our approach is a level-wise training strategy that enables learning from small subgraphs and generalization to million-node domains, bypassing the bottlenecks of prior neural AMG attempts. RAPNet executes exclusively during the solver setup phase, ensuring that the solve phase retains its favorable computational properties. We show that our method outperforms classical non-Galerkin baselines on diverse PDE discretizations and graph Laplacians, making it particularly effective for multi-query tasks such as eigenproblems, time-dependent simulations, and inverse or design problems. |
| 15:45 | N EFFICIENT CUMULATIVE EDGE-DETECTION METHOD FOR EDGE-PRESERVING IMAGE RECONSTRUCTION PRESENTER: Toluwani Okunola ABSTRACT. When reconstructing images from noisy measurements such as in medical scans or scientific imaging—we face an inverse problem: recovering an unknown image from indirect, corrupted observations. These problems are typically ill-posed, meaning small amounts of noise can lead to wildly inaccurate reconstructions. Regularization techniques address this by incorporating additional assumptions about the desired solution, such as smoothness or sparsity. However, standard regularization methods often blur sharp edges—the boundaries between different tissues, materials, or structures—losing critical detail. A powerful strategy for edge preservation is iterative reweighting, which solves a sequence of weighted ℓ2 subproblems with adaptively updated weights. Existing methods differ primarily in how weights are computed. Non-cumulative schemes, such as majorization-minimization methods for ℓ1-regularization, derive weights from the current iterate alone and can be solved efficiently using solvers like Recycled Majorization-Minimization Generalized Krylov Subspace algorithm (RMM- GKS). In contrast, the cumulative reweighting approach of Gazzola et al. progressively accumulates edge information across iterations, achieving superior edge preservation but at high computational cost due to deeply nested iterations. This work introduces CR-ℓq-RMM-GKS, a framework that combines cumulative edge detection with computational efficiency. We integrate Gazzola’s cumulative weighting strategy with the RMM-GKS solver, which handles general ℓq-norm penalties (0 < q ≤ 2), automatically selects regularization parameters, and recycles Krylov subspaces between outer iterations. This reduces the nested structure to only two levels while extending cumulative weighting to ℓq penalties. Numerical experiments in signal deblurring and sparse tomography demonstrate that CR-ℓq-RMM-GKS achieves significantly sharper edges than standard non-cumulative methods. Notably, CR-ℓ1-RMM-GKS outperforms both standard ℓ1 methods and CR-ℓ2-RMM-GKS, showing that cumulative weighting and ℓ1 penalties are highly complementary. |
| 16:10 | Optimal transfer operators and convergence bounds for nonsymmetric two-grid methods PRESENTER: Ludwig Rooch ABSTRACT. Algebraic Multigrid methods have been proven to be effective solvers for large-scale linear algebraic systems with Hermitian positive definite (HPD) system matrix. For such problems the convergence in the system matrix induced norm is well understood, but for nonsymmetric indefinite systems fewer results exist. Recently, convergence results for more general norms induced by certain HPD matrices were established. There, orthogonal projections built by compatible transfer operators are used. Here, we present a theoretical framework for nonsymmetric algebraic two-grid methods for arbitrary inner products and induced norms which naturally includes the HPD case and all recent results for the nonsymmetric case. For this purpose, we consider two different two-grid error operators, the first one being the natural generalization of the error operator in the HPD case. The second operator has been studied before and is simpler, but requires the additional assumption of normality of the smoothing step to achieve convergence. We explain the differences and similarities of both operators, the necessity of the extra condition and generalize some previous results. Moreover, we give sharp estimates for the norms of the error propagation matrices and establish optimal interpolation and restriction operators for both methods. Finally, we analyze the effect of the number of coarse variables on the convergence speed. This talk is based on a joint work with Reinhard Nabben (TU Berlin). |