View: session overviewtalk overview

Hasnaa Zidani (ENSTA Paris, France)

Jerome Bolte (Universite Toulouse 1 Capitole, France)

10:30 | Long term dynamics of the subgradient optimization method in the non-smooth and non-convex case ABSTRACT. The advent of huge scale non convex non smooth and poorly structured optimization problems (e.g. deep learning) has triggered a revival of interest in the non-smooth explicit subgradient method of Shor. This problem appears to be much more delicate than implicit versions (like the proximal gradient and \textsc{palm} algorithms). It features indeed highly oscillatory behavior even with rigidity assumptions like semi-algebraicity of the objective function. In recent work, we analyze the algorithm under fairly weak assumptions and we prove some convergence properties. To do this we use closed measures, a tool taken from geometric measure theory that we expect to be useful also for the analysis of convergence of other algorithms involving sequences in Euclidean spaces. |

11:00 | Constant Along Primal Rays Conjugacies Suitable for Functions of the Support} PRESENTER: Michel De Lara ABSTRACT. The support of a vector in~$\mathbb{R}^d$ is the set of indices with nonzero entry. Functions of the support are widespread in sparse optimization, like the so-called $\ell_0$~pseudonorm which counts the number of nonzero components of a vector. Functions of the support of a vector have the property to be $0$-homogeneous. Because of that, the Fenchel conjugacy fails to provide relevant analysis. In this paper, we display a class of conjugacies that are suitable for functions of the support. For this purpose, we suppose given a (source) norm on~$\mathbb{R}^d$. With this norm, we define, on the one hand, a family of so-called local coordinate-$K$ norms (with \( K \subset \{1,\ldots,d\} \)) and, on the other hand, a coupling between~$\mathbb{R}^d$ and $\mathbb{R}^d$, called Capra (constant along primal rays). Then, we provide formulas for the Capra-conjugate and biconjugate, and for the Capra-subdifferentials, of functions of the support in terms of the local coordinate-$K$ norms. The special case of nondecreasing submodular functions of the support is considered. |

11:30 | Prox-regular sets: separation and metric properties ABSTRACT. It has been well recognized that prox-regular sets (also know as positively reached, weakly convex, O(2)-convex, phi-convex, proximally smooth) play an important role in many areas of mathematical analysis including Optimization, Set-Valued Analysis, Differential Geometry and PDES. Roughly speaking, a (closed) subset C of a Hilbert space is said to be In this talk, we present diverse new metric properties that prox-regular sets shared with convex ones : - A closed convex set is completely determined by its support function; - The analytic formulation of the distance from a convex set; - The distance from a set C to a vector x (outside of C) coincides with the maximum of distances taken over all hyperplanes H separating C and x and this maximum is attained for one and only one hyperplane. |

12:00 | HJB equations on stratified domains PRESENTER: Othmane Jerhaoui ABSTRACT. In this talk, we are interested in some control problems on stratified domains. In such problems, the state variable space is partitioned in different open sets separated by an interface, that is a collection of lower dimensional manifolds. Each open region is associated with a compact control set, a controlled dynamics and a cost function. The simplest example of such structures (in dimension 2) is two open and disjoint half planes with a common boundary which is a line, see [1, 3]. We are interested in characterizing the value function of the optimal control problem as solution to an adequate Hamilton-Jacobi equation. More precisely, we prove that the essential Hamiltonian used in [4] and [2] characterizes both the sub and super solutions. Finally, we show that a strong comparison principle holds even when the solution in the viscosity sense (the value function) is only lower semicontinuous. |

10:30 | Online and Stochastic Optimization Beyond Lipschitz Continuity: A Riemannian Approach PRESENTER: Kimon Antonakopoulos ABSTRACT. Motivated by applications to machine learning and imaging science, we study a class of online and stochastic optimization problems with loss functions that are not Lipschitz continuous; in particular, the loss functions encountered by the optimizer could exhibit gradient singularities or be singular themselves. Drawing on tools and techniques from Riemannian geometry, we examine a Riemann–Lipschitz (RL) continuity condition which is tailored to the singularity landscape of the problem’s loss functions. In this way, we are able to tackle cases beyond the Lipschitz framework provided by a global norm, and we derive optimal regret bounds and last iterate convergence results through the use of regularized learning methods (such as online mirror descent). These results are subsequently validated in a class of stochastic Poisson inverse problems that arise in imaging science. |

11:00 | A nonlinearly-constrained bundle method for nonconvex chance-constrained problems PRESENTER: Paul Javal ABSTRACT. We consider a class of nonconvex chance-constrained optimization problems whose involved functions can be represented by the Difference of Convex (DC) functions. A DC function approximating the probability constraint is investigated, and the resulting DC-constrained optimization problem is handled by a new infeasible bundle method based on the so-called improvement functions. The proposed algorithm neither employs penalization techniques nor solves subproblems with linearized constraints. The approach is assessed in a class of several (nonconvex) chance-constrained programs. |

11:30 | Hard Shape-Constrained Kernel Regression PRESENTER: Pierre-Cyril Aubin-Frankowski ABSTRACT. Shape-constrained regression problems, such as requiring a function to be positive/monotone/convex over an interval, arise in a large number of applications, among which econometrics or finance. Leveraging prior knowledge expressed in terms of shape structures has several advantages but the nondiscrete set on which the constraints have to hold makes the estimation challenging. For problems where the hypothesis space is a reproducing kernel Hilbert space, we show how second-order cone programming techniques can be applied to solve a strengthened version satisfying the imposed shape-constraints. We prove also performance guarantees and apply the method to joint quantile regression and in the context of transportation systems. |

Rida Laraki (Université Dauphine - CNRS, France)

10:30 | Stability of marketable payoffs with re-trading PRESENTER: Jean-Marc Bonnisseau ABSTRACT. We consider a stochastic financial exchange economy with a finite date-event tree representing time and uncertainty and a financial structure with possibly long-term assets. We address the question of the stability of financial structures, that is the continuity of the set of marketable payoffs with respect to the asset prices. In a previous paper, we have exhibited a sufficient condition, which is based only on the return of the assets. However, it is never satisfied in the structures with re-trading (See Bonnisseau-Ch\'ery), which is a very common feature in many papers following the model presented in the book of Magill and Quinzii. The main purpose of this paper is to deepen the study of the stability of financial structures with long-term assets and especially to address this issue in cases where there is a re-trading of assets after their issuance at different nodes and different dates. We exhibit a new sufficient condition on general financial structures, which enjoys the property to be inherited by the re-trading extension of a financial structure. We also show that the situation is more complex when all initial assets are not issued at the initial date and we then provide a stronger condition to cover this case. |

11:00 | Diffusion on large networks PRESENTER: Michel Grabisch ABSTRACT. We investigate the phenomenon of diffusion in a countably infinite society of individuals interacting with their neighbors in a network. At a given time, each individual is either active or inactive. The diffusion is driven by two characteristics: the network structure and the diffusion mechanism represented by an aggregation function. We distinguish between two diffusion mechanisms (probabilistic, deterministic) and focus on two types of aggregation functions (strict, Boolean). Under strict aggregation functions, polarization of the society cannot happen, and its state evolves towards a mixture of infinitely many active and infinitely many inactive agents, or towards a homogeneous society. Under Boolean aggregation functions, the diffusion process becomes deterministic and the contagion model of Morris (2000) becomes a particular case of our framework. Polarization can then happen. Our dynamics also allows for cycles in both cases. The network structure is not relevant for these questions, but is important for establishing irreducibility, at the price of a richness assumption: the network should contain infinitely many complex stars and have enough space for storing local configurations. |

11:30 | Sequential network formation: new stability concepts PRESENTER: Julien Fixary ABSTRACT. For twenty years, literature on networks has been very active. Part of this literature models network formation using a game theoretic approach: each node in the network is an agent and every agent is given a payoff function (a representation of its linking preferences) and chooses to form links with other agents based upon it. A standard modelization is to consider the notion of \textit{society} which can be seen as the counterpart of a normal-form game in network formation and which is composed of a \textit{set of agents} $N$ (a finite subset of $\mathbb{N}$), a \textit{set of weights} $W$ (a subset of $[0,1]$), a \textit{set of links} $L$ (a subset of $\{\{i,j\} \subset N \ | \ i \neq j\}$, where $\{i,j\} \in L$ is denoted $ij$) and for every agent $i \in N$, a \textit{payoff function} $v_{i}$ (a mapping from the \textit{set of networks} $\mathbb{G} := W^{L}$ to $\mathbb{R}$). In this context, Jackson and Wolinsky introduced a strategic notion of stability which is called \textit{pairwise stability}. Briefly, a network $g \in \mathbb{G}$ is pairwise stable if for every link $ij$, $(1)$ there exists no agent $k \in ij$ which has a strict interest (regarding $v_{k}$) to decrease the weight $g(ij)$ attributed to $ij$, $(2)$ both agents of $i$ and $j$ don't have a common strict interest (regarding $v_{i}$ and $v_{j}$) to increase the weight $g(ij)$ attributed to $ij$. The purpose of our work is to extend the idea of this concept to a sequential framework. First, we introduce a possible counterpart of an extensive-form game in network formation that we call \textit{sequential society}. Intuitively, we consider a network formation process in $T$ steps ($T \in \mathbb{N}^{\ast}$) such that at each step $t$ corresponds a link $i_{t}j_{t}$. Agents $i_{t}$ and $j_{t}$ have to choose a weight $w_{t}$ regarding every possible choices of every links $i_{\tau}j_{\tau}$ ($\tau < t$). Second, given a sequential society, we introduce two possible counterparts of the concept of subgame perfect equilibrium in network formation that we call \textit{sequentially pairwise stable equilibrium} (or \textit{SPSE}) and \textit{weak sequentially pairwise stable equilibrium} (or \textit{weak SPSE}). These two concepts share some features with the concept of subgame perfect equilibrium and with the concept of pairwise stability introduced by Jackson and Wolinsky. By definition of pairwise stability, it can be expected that finding a SPSE (or a weak SPSE) leads to an unusual optimization problem. We study these notions and the existence of elements satisfying its in three different cases (depending on the set of weights $W$): when $W$ corresponds to $\{0,1\}$ (\textit{unweighted case}), when $W$ is a finite subset of $[0,1]$ (\textit{discretely weighted case}) and when $W$ corresponds to $[0,1]$ (\textit{weighted case}). |

Hasnaa Zidani (ENSTA Paris, France)

14:00 | Parameter Estimation for Dynamic Resource Allocation in Microorganisms: A Bi-level Optimization Problem PRESENTER: Terence Bayen ABSTRACT. Given their key role in almost all ecosystems and in several industries, understanding and predicting microorganism growth is of utmost importance. In compliance with evolutionary principles, coarse-grained or genome-scale models of microbial growth can be used to determine optimal resource allocation scheme under dynamic environmental conditions. Resource allocation have given important qualitative results, but it still remains a gap towards quantitiative predictions. The first step in this direction is parameter calibration with experimental data. But fitting these models results in a bi-level optimization problem, whose numerical resolution involves complex optimization issues. As a case study, we present here a coarse-grained model describing how microalgae acclimate to a change in light intensity. We first determine using the Pontryagin maximum principle and numerical simulations the optimal strategy, corresponding to a turnpike with a chattering arc. Then, a bi-level optimization problem is proposed to calibrate the model with experimental data. To solve it, a classical parameter identification routine is used, calling at each iteration the bocop solver to solve the optimal control problem (by a direct method). The calibrated model is able to represent the photoacclimation dynamics of the microalga facing a down-shift of light intensity. |

14:30 | Optimization of Palliative Cancer Therapy: the Logic of Containing Tumors PRESENTER: Yannick Viossat ABSTRACT. Many cancer treatments are administered at the maximal tolerated dose, and eventually fail as the tumor becomes resistant. When a cure is highly unlikely, an alternative is to use the minimal effective dose, that is, the minimal dose that allows to contain the tumor while ensuring a sufficient quality of life to the patient. The hope is to diminish both treatment toxicity and selection for resistance to treatment, allowing us to control the tumor for a longer time. Experimental results are promising. We study how to maximize the time at which tumor size becomes higher than a given threshold. For simple models with two types of cells, sensitive and fully resistant to treatment, we find that under very general conditions, containment strategies are optimal, or almost optimal. This unifies a number of previous approaches. The clinical gains with respect to more aggressive treatments crucially depends on the tumor growth model. In particular, our results are much more robust to modifications of the basic model for Gompertzian growth than for logistic growth. Finally, if resistant cells retain some sensitivity to treatment, the optimal strategy is to first contain the tumor with a low-dose treatment but eventually switch to maximal tolerated dose. This is the opposite of what other authors have found, and we will explain why. This is part of the PGMO project ``Optimization of a new type of cancer therapy''. |

15:00 | About optimal crop irrigation planing under water scarcity PRESENTER: Kenza Boumaza ABSTRACT. We study theoretically on a simple crop model the optimization problem which consists in maximizing the biomass production at harvesting time, under a constraint on the total amount of water used. Under a lower water availability we show with the tools of the optimal control theory that, despite the lack of differentiability in the dynamics, the optimal solution may present a singular arc. Which leads to a bang-singular-bang solution. This solution may be better than a simple bang-bang strategy as commonly used. |

Vincent Leclere (ENPC, France)

14:00 | Sur la convergence de la méthode du gradient hybride primal-dual stochastique PRESENTER: Olivier Fercoq ABSTRACT. Nous analysons la méthode du gradient hybride primal-dual stochastique (SPDHG) proposée récemment par Chambolle et al. et démontrons de nouveaux résultats théoriques. À chaque itération, une unique coordonnée du vecteur de variables duales est mis à jour alors que toutes les coordonnées du vecteur primal sont modifiées. Nous faisons l'hypothèse que le gradient généralisé du lagrangien est métriquement sous-régulier. Cette hypothèse de travail est satisfaite pour les problèmes d'optimisation linéaire, le Lasso, les Séparateurs à Vaste Marge et tous les problèmes lisses et fortement convexes. Nous pouvons ainsi démontrer la convergence linéaire de la méthode. Fait notable, l'algorithme ne dépend pas de constante de sous-régularité métrique, ce qui explique qu'on observe la convergence linéaire même si on n'est pas capable d'estimer cette constante. Dans le cas général, nous prouvons la convergence presque sûre des itérés vers une solution du problème et une vitesse de convergence sous-linéaire. Enfin, nos expériences numériques montrent que SPDHG avec les pas standards a une bonne performance et est robuste comparé à la variante SPDHG-mu spécialisée pour les fonctions fortement convexes, au gradient stochastique avec réduction de variance et à la descente par coordonnée duale. |

14:30 | a Randomized Proximal Gradient Method with Structure-Adapted Sampling ABSTRACT. Many applications in machine learning involve nonsmooth optimization problems. This nonsmoothness brings a low-dimensional structure to the optimal solutions. In this talk, we propose a randomized proximal gradient method harnessing this underlying structure. |

15:00 | Convergence and Dynamical Behavior of the ADAM Algorithm for Non Convex Stochastic Optimization PRESENTER: Anas Barakat ABSTRACT. Adam is a popular variant of the stochastic gradient descent for finding a local minimizer of a function. The objective function is unknown but a random estimate of the current gradient vector is observed at each round of the algorithm. In the first part, we will focus on the dynamical behavior of the algorithm and its asymptotic analysis. Assuming that the objective function is differentiable and non-convex, we establish the convergence in the long run of the iterates to a stationary point. The key ingredient is the introduction of a continuous-time version of Adam, under the form of a non-autonomous ordinary differential equation. The existence and the uniqueness of the solution are established, as well as the convergence of the solution towards the stationary points of the objective function. The continuous-time system is a relevant approximation of the Adam iterates, in the sense that the interpolated Adam process converges weakly to the solution to the ODE. In the second part, we will present a non-asymptotic analysis of a clipped version of Adam. We study the Adam algorithm for smooth nonconvex optimization under a boundedness assumption on the adaptive learning rate. The bound on the adaptive step size depends on the Lipschitz constant of the gradient of the objective function and provides safe theoretical adaptive step sizes. Under this boundedness assumption, we show a novel first order convergence rate result in both deterministic and stochastic contexts. Furthermore, we establish convergence rates of the function value sequence using the Kurdyka-Lojasiewicz property. |

Hasnaa Zidani (ENSTA Paris, France)

16:00 | Multi-objective infinite horizon optimal control problems: characterization of the Pareto fronts and reconstruction of optimal trajectories PRESENTER: Ana Paula Chorobura ABSTRACT. We will present a method to compute the Pareto fronts of multi-objective infinite horizon optimal control problems with state constraints. First, a mono-objective auxiliary optimal control problem, free of state constraints, is introduced. The zero level set of the value function of the auxiliary control problem is related with the (weak) Pareto front of the multi-objective optimal control problem. Moreover, a more detailed characterization of the Pareto front for the multi-objective infinite horizon control problem is given. In the infinite horizon context, the value function of the auxiliary optimal control problem satisfies a Hamilton-Jacobi-Bellman equation, however it is not the unique solution. Usually numerical approximations are analyzed (convergence and error estimate) under the assumption that a comparison principle holds. A Semi-Lagrangian method is proposed to compute the value function of the auxiliary control problem and the proof of convergence results is obtained by using only the Dynamic Programming Principle. Furthermore, we discuss an algorithm of reconstruction of Pareto optimal trajectories. |

16:30 | Two-player zero-sum deterministic differential game in infinite horizon involving impulse controls with a new QVI PRESENTER: Hafid Lalioui ABSTRACT. We study a two-player zero-sum deterministic differential game with both players adopting impulse controls in infinite horizon with costs functions depends on the state variable. We aim to prove that the impulse control problem admit a value. We prove by the mean of dynamic programming principle that the value functions are continuous and viscosity solutions of the corresponding Hamilton-Jacobi-Bellman-Isaacs (HJBI) quasi-variational inequality (QVI). We define a new differential QVI for which the value functions are the unique viscosity solution. We then prove that the lower and upper value functions coincide. |

17:00 | Primal-Dual algorithm for Lambertian Shape from Shading ABSTRACT. Shape from Shading is a classical problem in compute vision. It consists in reconstructing the 3D shape of an object from a 2D image. A PDE formulation of this problem gives rise to a PDE of the form $H(x,Du) = 0$ on $\Omega$ where $\Omega$ is a domain of $\mathbb{R}^d$ and $H:\Omega\times\mathbb{R}^d\to\mathbb{R}$ is the Hamiltonian. To every Hamiltonian one can associate a Finsler metric $\sigma$ which characterise the set of all subsolutions of HJ equation via its dual $\sigma^*$. We show \cite{preprint} that the maximal viscosity subsolution is also solution of a maximization problem of the form \[ \max_{u} \{ \int_{\Omega} u dx:~\sigma^{*}(x,\nabla u) \leq 1~+~\hbox{boundary conditions}. \} \] Using duality, we end up with a problem of the form \begin{equation} \label{SP} \inf_{u}\sup_{\phi} \mathcal{F}(u) + \langle\nabla u, \phi\rangle - \mathcal{G}^{*}(\phi) \end{equation} for appropriate functionals $\mathcal{F}$ and $\mathcal{G}$. The problem \eqref{SP} falls into the scope of \cite{cp} and thus offers an alternative to the commonly used approaches to tackle the SfS problem. |

Vincent Leclere (ENPC, France)

16:00 | Interior point methods for logistic regression ABSTRACT. Logistic regression is a well known statistical model used to predict binary outcomes. Fitting logistic regression models reduces to the minimization of an empirical loss penalizing the gap between the values predicted by the statistical model and the observations. The corresponding optimization problem formulates naturally as a (unconstrained) non-linear problem. Then, the numerical resolution of the problem often fallbacks to a first order method. However, if the size of the dataset is not too large, second order methods also have proven to be effective. In this talk, we study the resolution of logistic regression problems with a commercial interior point solver, Artelys Knitro. We first provide numerical studies showing that Knitro behaves well to fit logistic regression models, both with $\ell_1$ and $\ell_2$ penalties. We compare Knitro with the solver L-BFGS-B, currently used in the library Scikit-Learn. In the second part of the talk, we focus on the optimization of regularization hyperparameters. We suppose given a cross-validation evaluation function, based onto a validation dataset. By using the implicit function theorem, we derive the sensivity of the cross-validation loss function w.r.t. the penalty term, thus allowing to optimize it with a descent algorithm based on BFGS. The devised algorithm is faster than classical methods such as grid searches, and outperforms the finite-difference algorithm implemented inside Knitro. Eventually, we depict some ideas to improve interior point algorithms for the resolution of optimization problems arising in machine learning. |

16:30 | A stochastic Lanvegin sampling approach to Median of Means estimation ABSTRACT. Big Data are at the heart of all substantial progress in current machine learning with impact on self driving cars, softwares outperforming human efficiency standards, decision making in medicine, law, chess or Go. Nevertheless, data from real-world experiments oftentimes tend to be corrupted with outliers and/or exhibit heavy tails. For instance, data from finance, often suffer from heavy-tailed phenomena. Also, gene expression data are usually not clean and may be corrupted by the presence of outliers. Fraud detection is another form of outlier detection. In order to address these challenging data analytics problems, there is an urgent need for methods that automatically correct for such undesirable phenomena. This program was initiated by Catoni for the estimation of the mean on the real line and Audibert and Catoni in the least-squares regression framework. To the best of our knowledge, estimation of the mean of a real valued random variable is still best performed using the Median-of-Means (MoM) approach. In regression models, Median of Means can handle outliers or heavy tails in the covariates as well as in the reponse. In the present work, we propose a new scalable approach to MoM estimation based on resampling at each iteration of a stochastic Langevin sampler. We prove polynomial time convergence with high probability based on hitting time analysis. |

17:00 | A stochastic Levenberg-Marquardt Method Using Random Models PRESENTER: Clément Royer ABSTRACT. Globally convergent variants of the Gauss-Newton algorithm are often the preferred methods to tackle nonlinear least squares problems. Among such frameworks, Levenberg-Marquardt and trust-region methods are two well-established, similar paradigms, with analyses that are often close in spirit. Both schemes have been successfully studied when the Gauss-Newton model is replaced by a random model, only accurate with a given probability. Meanwhile, problems where even the objective value is subject to noise have gained interest, driven by the need for efficient methods in fields such as data assimilation. In this talk, we describe a stochastic Levenberg-Marquardt algorithm that handles noisy objective function values and random models, provided sufficient accuracy is achieved in probability. In particular, if the probability of accurate function estimates and models is sufficiently large, we establish that the proposed algorithm converges globally to a first-order stationary point of the problem with probability one. Furthermore, we bound the expected number of iterations needed to reach an approximate stationary point. Our method relies on both a specific scaling of the regularization parameter and a measure of criticality tailored to least-squares problems. We finally present an application of our method to variational data assimilation, where stochasticity arises from the so-called ensemble methods. |