SMAI-MODE 2020: JOURNéES SMAI-MODE 2020
PROGRAM FOR MONDAY, SEPTEMBER 7TH
Days:
next day
all days

View: session overviewtalk overview

08:45-09:00 Session MW: Introduction/Welcome
Chairs:
Marianne Akian (INRIA and CMAP, CNRS, Ecole polytechnique, IPP, France)
Hasnaa Zidani (ENSTA Paris, France)
Location: Amphitheater
09:00-10:00 Session MP1: Exposé plénier/Plenary talk
Chairs:
Marianne Akian (INRIA and CMAP, CNRS, Ecole polytechnique, IPP, France)
Jerome Bolte (Universite Toulouse 1 Capitole, France)
Location: Amphitheater
09:00
Claire Mathieu (CNRS, France)
Two-sided matching markets with correlated random preferences have few stable pairs

ABSTRACT. Stable matching in a community consisting of N men and N women is a classical combinatorial problem that has been the subject of intense theoretical and empirical study since its introduction in 1962 in a seminal paper by Gale and Shapley.

In this paper, we study the number of stable pairs, that is, the man/woman pairs that appear in some stable matching. We prove that if the preference lists on one side are generated at random using the popularity model of Immorlica and Mahdian, the expected number of stable edges is bounded by N ln(N) + N, matching the asymptotic value for uniform preference lists. If in addition that popularity model is a geometric distribution, then the number of stable edges is O(N) and the incentive to manipulate is limited. If in addition the preference lists on the other side are uniform, then the number of stable edges is asymptotically N up to lower order terms: most participants have a unique stable partner, hence non-manipulability. 

This is joint work with Hugo Gimbert and Simon Mauras

10:00-10:30Pause/ Break
10:30-12:30 Session MM1: Variational and Nonsmooth analysis
Chairs:
Jerome Bolte (Universite Toulouse 1 Capitole, France)
Hasnaa Zidani (ENSTA Paris, France)
Location: Room 1
10:30
Rodolfo Rios-Zertuche (Artificial and Natural Intelligence Toulouse Institute, France)
Long term dynamics of the subgradient optimization method in the non-smooth and non-convex case

ABSTRACT. The advent of huge scale non convex non smooth and poorly structured optimization problems (e.g. deep learning) has triggered a revival of interest in the non-smooth explicit subgradient method of Shor. This problem appears to be much more delicate than implicit versions (like the proximal gradient and \textsc{palm} algorithms). It features indeed highly oscillatory behavior even with rigidity assumptions like semi-algebraicity of the objective function. In recent work, we analyze the algorithm under fairly weak assumptions and we prove some convergence properties. To do this we use closed measures, a tool taken from geometric measure theory that we expect to be useful also for the analysis of convergence of other algorithms involving sequences in Euclidean spaces.

11:00
Michel De Lara (Ecole des Ponts ParisTech, CERMICS, France)
Jean-Philippe Chancelier (Ecole des Ponts ParisTech, CERMICS, France)
Constant Along Primal Rays Conjugacies Suitable for Functions of the Support}
PRESENTER: Michel De Lara

ABSTRACT. The support of a vector in~$\mathbb{R}^d$ is the set of indices with nonzero entry. Functions of the support are widespread in sparse optimization, like the so-called $\ell_0$~pseudonorm which counts the number of nonzero components of a vector. Functions of the support of a vector have the property to be $0$-homogeneous. Because of that, the Fenchel conjugacy fails to provide relevant analysis. In this paper, we display a class of conjugacies that are suitable for functions of the support. For this purpose, we suppose given a (source) norm on~$\mathbb{R}^d$. With this norm, we define, on the one hand, a family of so-called local coordinate-$K$ norms (with \( K \subset \{1,\ldots,d\} \)) and, on the other hand, a coupling between~$\mathbb{R}^d$ and $\mathbb{R}^d$, called Capra (constant along primal rays). Then, we provide formulas for the Capra-conjugate and biconjugate, and for the Capra-subdifferentials, of functions of the support in terms of the local coordinate-$K$ norms. The special case of nondecreasing submodular functions of the support is considered.

11:30
Florent Nacry (University Perpignan Via Domitia, France)
Prox-regular sets: separation and metric properties

ABSTRACT. It has been well recognized that prox-regular sets (also know as positively reached, weakly convex, O(2)-convex, phi-convex, proximally smooth) play an important role in many areas of mathematical analysis including Optimization, Set-Valued Analysis, Differential Geometry and PDES. Roughly speaking, a (closed) subset C of a Hilbert space is said to be uniformly prox-regular of constant r provided that the nearest point mapping Proj_C is single-valued on a suitable neighborhood (depending on r) of the set C and continuous therein.

In this talk, we present diverse new metric properties that prox-regular sets shared with convex ones :

- A closed convex set is completely determined by its support function;

- The analytic formulation of the distance from a convex set;

- The distance from a set C to a vector x (outside of C) coincides with the maximum of distances taken over all hyperplanes H separating C and x and this maximum is attained for one and only one hyperplane.

12:00
Othmane Jerhaoui (ENSTA/UMA, France)
Hasnaa Zidani (ENSTA/UMA, France)
HJB equations on stratified domains
PRESENTER: Othmane Jerhaoui

ABSTRACT. In this talk, we are interested in some control problems on stratified domains. In such problems, the state variable space is partitioned in different open sets separated by an interface, that is a collection of lower dimensional manifolds. Each open region is associated with a compact control set, a controlled dynamics and a cost function. The simplest example of such structures (in dimension 2) is two open and disjoint half planes with a common boundary which is a line, see [1, 3]. We are interested in characterizing the value function of the optimal control problem as solution to an adequate Hamilton-Jacobi equation. More precisely, we prove that the essential Hamiltonian used in [4] and [2] characterizes both the sub and super solutions. Finally, we show that a strong comparison principle holds even when the solution in the viscosity sense (the value function) is only lower semicontinuous.

10:30-12:30 Session MM2: Stochastic Optimization and Learning
Chairs:
Vincent Leclere (ENPC, France)
Vianney Perchet (ENS Paris-Saclay, France)
Location: Room 2
10:30
Kimon Antonakopoulos (INRIA, France)
Panayotis Mertikopoulos (CNRS, France)
E. Veronica Belmega (ETIS / ENSEA - Université de Cergy-Pontoise - CNRS, France)
Online and Stochastic Optimization Beyond Lipschitz Continuity: A Riemannian Approach

ABSTRACT. Motivated by applications to machine learning and imaging science, we study a class of online and stochastic optimization problems with loss functions that are not Lipschitz continuous; in particular, the loss functions encountered by the optimizer could exhibit gradient singularities or be singular themselves. Drawing on tools and techniques from Riemannian geometry, we examine a Riemann–Lipschitz (RL) continuity condition which is tailored to the singularity landscape of the problem’s loss functions. In this way, we are able to tackle cases beyond the Lipschitz framework provided by a global norm, and we derive optimal regret bounds and last iterate convergence results through the use of regularized learning methods (such as online mirror descent). These results are subsequently validated in a class of stochastic Poisson inverse problems that arise in imaging science.

11:00
Paul Javal (EDF RD, France)
Wim van Ackooij (EDF RD, France)
Sophie Demassey (CMA, Mines ParisTech, France)
Hugo Morais (Técnico Lisboa, Portugal)
Welington de Oliveira (Mines ParisTech, France)
Bhargav Swaminathan (EDF RD, France)
A nonlinearly-constrained bundle method for nonconvex chance-constrained problems
PRESENTER: Paul Javal

ABSTRACT. We consider a class of nonconvex chance-constrained optimization problems whose involved functions can be represented by the Difference of Convex (DC) functions. A DC function approximating the probability constraint is investigated, and the resulting DC-constrained optimization problem is handled by a new infeasible bundle method based on the so-called improvement functions. The proposed algorithm neither employs penalization techniques nor solves subproblems with linearized constraints. The approach is assessed in a class of several (nonconvex) chance-constrained programs.

11:30
Pierre-Cyril Aubin-Frankowski (MINES ParisTech, France)
Zoltan Szabo (CNRS, Ecole Polytechnique, France)
Hard Shape-Constrained Kernel Regression

ABSTRACT. Shape-constrained regression problems, such as requiring a function to be positive/monotone/convex over an interval, arise in a large number of applications, among which econometrics or finance. Leveraging prior knowledge expressed in terms of shape structures has several advantages but the nondiscrete set on which the constraints have to hold makes the estimation challenging. For problems where the hypothesis space is a reproducing kernel Hilbert space, we show how second-order cone programming techniques can be applied to solve a strengthened version satisfying the imposed shape-constraints. We prove also performance guarantees and apply the method to joint quantile regression and in the context of transportation systems.

10:30-12:30 Session MM3: Economy and Games
Chairs:
Marianne Akian (INRIA and CMAP, CNRS, Ecole polytechnique, IPP, France)
Rida Laraki (Université Dauphine - CNRS, France)
Location: Room 3
10:30
Jean-Marc Bonnisseau (Université Paris 1 Panthéon-Sorbonne, France)
Achis Chéry (Université d'Etat d'Haiti, (LS2E), Port-au-Prince, Haiti et CREGED, Université Quisqueya, Haiti)
Stability of marketable payoffs with re-trading

ABSTRACT. We consider a stochastic financial exchange economy with a finite date-event tree representing time and uncertainty and a financial structure with possibly long-term assets. We address the question of the stability of financial structures, that is the continuity of the set of marketable payoffs with respect to the asset prices. In a previous paper, we have exhibited a sufficient condition, which is based only on the return of the assets. However, it is never satisfied in the structures with re-trading (See Bonnisseau-Ch\'ery), which is a very common feature in many papers following the model presented in the book of Magill and Quinzii. The main purpose of this paper is to deepen the study of the stability of financial structures with long-term assets and especially to address this issue in cases where there is a re-trading of assets after their issuance at different nodes and different dates. We exhibit a new sufficient condition on general financial structures, which enjoys the property to be inherited by the re-trading extension of a financial structure. We also show that the situation is more complex when all initial assets are not issued at the initial date and we then provide a stronger condition to cover this case.

11:00
Michel Grabisch (Université Paris I Panthéon-Sorbonne, Paris School of Economics, France)
Agnieszka Rusinowska (Paris School of Economics, CNRS, France)
Xavier Venel (Université Paris I Panthéon-Sorbonne, Paris School of Economics, France)
Diffusion on large networks
PRESENTER: Michel Grabisch

ABSTRACT. We investigate the phenomenon of diffusion in a countably infinite society of individuals interacting with their neighbors in a network. At a given time, each individual is either active or inactive. The diffusion is driven by two characteristics: the network structure and the diffusion mechanism represented by an aggregation function. We distinguish between two diffusion mechanisms (probabilistic, deterministic) and focus on two types of aggregation functions (strict, Boolean). Under strict aggregation functions, polarization of the society cannot happen, and its state evolves towards a mixture of infinitely many active and infinitely many inactive agents, or towards a homogeneous society. Under Boolean aggregation functions, the diffusion process becomes deterministic and the contagion model of Morris (2000) becomes a particular case of our framework. Polarization can then happen. Our dynamics also allows for cycles in both cases. The network structure is not relevant for these questions, but is important for establishing irreducibility, at the price of a richness assumption: the network should contain infinitely many complex stars and have enough space for storing local configurations.

11:30
Julien Fixary (Université Paris 1 Panthéon-Sorbonne, France)
Philippe Bich (Université Paris 1 Panthéon-Sorbonne, France)
Sequential network formation: new stability concepts
PRESENTER: Julien Fixary

ABSTRACT. For twenty years, literature on networks has been very active. Part of this literature models network formation using a game theoretic approach: each node in the network is an agent and every agent is given a payoff function (a representation of its linking preferences) and chooses to form links with other agents based upon it. A standard modelization is to consider the notion of \textit{society} which can be seen as the counterpart of a normal-form game in network formation and which is composed of a \textit{set of agents} $N$ (a finite subset of $\mathbb{N}$), a \textit{set of weights} $W$ (a subset of $[0,1]$), a \textit{set of links} $L$ (a subset of $\{\{i,j\} \subset N \ | \ i \neq j\}$, where $\{i,j\} \in L$ is denoted $ij$) and for every agent $i \in N$, a \textit{payoff function} $v_{i}$ (a mapping from the \textit{set of networks} $\mathbb{G} := W^{L}$ to $\mathbb{R}$). In this context, Jackson and Wolinsky introduced a strategic notion of stability which is called \textit{pairwise stability}. Briefly, a network $g \in \mathbb{G}$ is pairwise stable if for every link $ij$, $(1)$ there exists no agent $k \in ij$ which has a strict interest (regarding $v_{k}$) to decrease the weight $g(ij)$ attributed to $ij$, $(2)$ both agents of $i$ and $j$ don't have a common strict interest (regarding $v_{i}$ and $v_{j}$) to increase the weight $g(ij)$ attributed to $ij$. The purpose of our work is to extend the idea of this concept to a sequential framework.

First, we introduce a possible counterpart of an extensive-form game in network formation that we call \textit{sequential society}. Intuitively, we consider a network formation process in $T$ steps ($T \in \mathbb{N}^{\ast}$) such that at each step $t$ corresponds a link $i_{t}j_{t}$. Agents $i_{t}$ and $j_{t}$ have to choose a weight $w_{t}$ regarding every possible choices of every links $i_{\tau}j_{\tau}$ ($\tau < t$).

Second, given a sequential society, we introduce two possible counterparts of the concept of subgame perfect equilibrium in network formation that we call \textit{sequentially pairwise stable equilibrium} (or \textit{SPSE}) and \textit{weak sequentially pairwise stable equilibrium} (or \textit{weak SPSE}). These two concepts share some features with the concept of subgame perfect equilibrium and with the concept of pairwise stability introduced by Jackson and Wolinsky. By definition of pairwise stability, it can be expected that finding a SPSE (or a weak SPSE) leads to an unusual optimization problem. We study these notions and the existence of elements satisfying its in three different cases (depending on the set of weights $W$): when $W$ corresponds to $\{0,1\}$ (\textit{unweighted case}), when $W$ is a finite subset of $[0,1]$ (\textit{discretely weighted case}) and when $W$ corresponds to $[0,1]$ (\textit{weighted case}).

12:30-14:00Repas/ Lunch
14:00-15:30 Session MA11: Optimal control and applications
Chairs:
Jean-Baptiste Caillau (Université Côte d'Azur, CNRS, Inria, LJAD, France)
Hasnaa Zidani (ENSTA Paris, France)
Location: Room 1
14:00
Terence Bayen (University of Avignon, France)
Francis Mairet (IFREMER Nantes, France)
Parameter Estimation for Dynamic Resource Allocation in Microorganisms: A Bi-level Optimization Problem
PRESENTER: Terence Bayen

ABSTRACT. Given their key role in almost all ecosystems and in several industries, understanding and predicting microorganism growth is of utmost importance. In compliance with evolutionary principles, coarse-grained or genome-scale models of microbial growth can be used to determine optimal resource allocation scheme under dynamic environmental conditions. Resource allocation have given important qualitative results, but it still remains a gap towards quantitiative predictions. The first step in this direction is parameter calibration with experimental data. But fitting these models results in a bi-level optimization problem, whose numerical resolution involves complex optimization issues. As a case study, we present here a coarse-grained model describing how microalgae acclimate to a change in light intensity. We first determine using the Pontryagin maximum principle and numerical simulations the optimal strategy, corresponding to a turnpike with a chattering arc. Then, a bi-level optimization problem is proposed to calibrate the model with experimental data. To solve it, a classical parameter identification routine is used, calling at each iteration the bocop solver to solve the optimal control problem (by a direct method). The calibrated model is able to represent the photoacclimation dynamics of the microalga facing a down-shift of light intensity.

14:30
Yannick Viossat (Université Paris-Dauphine, France)
Robert Noble (ETH Zurich, Switzerland)
Optimization of Palliative Cancer Therapy: the Logic of Containing Tumors
PRESENTER: Yannick Viossat

ABSTRACT. Many cancer treatments are administered at the maximal tolerated dose, and eventually fail as the tumor becomes resistant. When a cure is highly unlikely, an alternative is to use the minimal effective dose, that is, the minimal dose that allows to contain the tumor while ensuring a sufficient quality of life to the patient. The hope is to diminish both treatment toxicity and selection for resistance to treatment, allowing us to control the tumor for a longer time. Experimental results are promising. We study how to maximize the time at which tumor size becomes higher than a given threshold. For simple models with two types of cells, sensitive and fully resistant to treatment, we find that under very general conditions, containment strategies are optimal, or almost optimal. This unifies a number of previous approaches. The clinical gains with respect to more aggressive treatments crucially depends on the tumor growth model. In particular, our results are much more robust to modifications of the basic model for Gompertzian growth than for logistic growth. Finally, if resistant cells retain some sensitivity to treatment, the optimal strategy is to first contain the tumor with a low-dose treatment but eventually switch to maximal tolerated dose. This is the opposite of what other authors have found, and we will explain why. This is part of the PGMO project ``Optimization of a new type of cancer therapy''.

15:00
Kenza Boumaza (Univ Montpellier, INRA, France)
Alain Rapaport (INRA, France)
Sébastien Roux (INRA, France)
About optimal crop irrigation planing under water scarcity
PRESENTER: Kenza Boumaza

ABSTRACT. We study theoretically on a simple crop model the optimization problem which consists in maximizing the biomass production at harvesting time, under a constraint on the total amount of water used. Under a lower water availability we show with the tools of the optimal control theory that, despite the lack of differentiability in the dynamics, the optimal solution may present a singular arc. Which leads to a bang-singular-bang solution. This solution may be better than a simple bang-bang strategy as commonly used.

14:00-15:30 Session MA21: Numerical Optimization/Sochastic Gradient method 1
Chairs:
Marianne Akian (INRIA and CMAP, CNRS, Ecole polytechnique, IPP, France)
Vincent Leclere (ENPC, France)
Location: Room 2
14:00
Olivier Fercoq (Télécom Paris, France)
Ahmet Alacaoglu (Ecole Polytechnique Fédérale de Lausanne, Switzerland)
Volkan Cevher (Ecole Polytechnique Fédérale de Lausanne, Switzerland)
Sur la convergence de la méthode du gradient hybride primal-dual stochastique
PRESENTER: Olivier Fercoq

ABSTRACT. Nous analysons la méthode du gradient hybride primal-dual stochastique (SPDHG) proposée récemment par Chambolle et al. et démontrons de nouveaux résultats théoriques.

À chaque itération, une unique coordonnée du vecteur de variables duales est mis à jour alors que toutes les coordonnées du vecteur primal sont modifiées.

Nous faisons l'hypothèse que le gradient généralisé du lagrangien est métriquement sous-régulier. Cette hypothèse de travail est satisfaite pour les problèmes d'optimisation linéaire, le Lasso, les Séparateurs à Vaste Marge et tous les problèmes lisses et fortement convexes.

Nous pouvons ainsi démontrer la convergence linéaire de la méthode. Fait notable, l'algorithme ne dépend pas de constante de sous-régularité métrique, ce qui explique qu'on observe la convergence linéaire même si on n'est pas capable d'estimer cette constante.

Dans le cas général, nous prouvons la convergence presque sûre des itérés vers une solution du problème et une vitesse de convergence sous-linéaire.

Enfin, nos expériences numériques montrent que SPDHG avec les pas standards a une bonne performance et est robuste comparé à la variante SPDHG-mu spécialisée pour les fonctions fortement convexes, au gradient stochastique avec réduction de variance et à la descente par coordonnée duale.

14:30
Franck Iutzeler (Univ. Grenoble Alpes, France)
a Randomized Proximal Gradient Method with Structure-Adapted Sampling

ABSTRACT. Many applications in machine learning involve nonsmooth optimization problems. This nonsmoothness brings a low-dimensional structure to the optimal solutions. In this talk, we propose a randomized proximal gradient method harnessing this underlying structure.

15:00
Anas Barakat (Télécom Paris, France)
Pascal Bianchi (Télécom Paris, France)
Convergence and Dynamical Behavior of the ADAM Algorithm for Non Convex Stochastic Optimization
PRESENTER: Anas Barakat

ABSTRACT. Adam is a popular variant of the stochastic gradient descent for finding a local minimizer of a function. The objective function is unknown but a random estimate of the current gradient vector is observed at each round of the algorithm.

In the first part, we will focus on the dynamical behavior of the algorithm and its asymptotic analysis. Assuming that the objective function is differentiable and non-convex, we establish the convergence in the long run of the iterates to a stationary point. The key ingredient is the introduction of a continuous-time version of Adam, under the form of a non-autonomous ordinary differential equation. The existence and the uniqueness of the solution are established, as well as the convergence of the solution towards the stationary points of the objective function. The continuous-time system is a relevant approximation of the Adam iterates, in the sense that the interpolated Adam process converges weakly to the solution to the ODE.

In the second part, we will present a non-asymptotic analysis of a clipped version of Adam. We study the Adam algorithm for smooth nonconvex optimization under a boundedness assumption on the adaptive learning rate. The bound on the adaptive step size depends on the Lipschitz constant of the gradient of the objective function and provides safe theoretical adaptive step sizes. Under this boundedness assumption, we show a novel first order convergence rate result in both deterministic and stochastic contexts. Furthermore, we establish convergence rates of the function value sequence using the Kurdyka-Lojasiewicz property.

16:00-17:30 Session MA12: Optimal control and Hamilton-Jacobi equations
Chairs:
Jean-Baptiste Caillau (Université Côte d'Azur, CNRS, Inria, LJAD, France)
Hasnaa Zidani (ENSTA Paris, France)
Location: Room 1
16:00
Ana Paula Chorobura (ENSTA Paris, France)
Hasnaa Zidani (Ensta Paris, France)
Multi-objective infinite horizon optimal control problems: characterization of the Pareto fronts and reconstruction of optimal trajectories

ABSTRACT. We will present a method to compute the Pareto fronts of multi-objective infinite horizon optimal control problems with state constraints. First, a mono-objective auxiliary optimal control problem, free of state constraints, is introduced. The zero level set of the value function of the auxiliary control problem is related with the (weak) Pareto front of the multi-objective optimal control problem. Moreover, a more detailed characterization of the Pareto front for the multi-objective infinite horizon control problem is given. In the infinite horizon context, the value function of the auxiliary optimal control problem satisfies a Hamilton-Jacobi-Bellman equation, however it is not the unique solution. Usually numerical approximations are analyzed (convergence and error estimate) under the assumption that a comparison principle holds. A Semi-Lagrangian method is proposed to compute the value function of the auxiliary control problem and the proof of convergence results is obtained by using only the Dynamic Programming Principle. Furthermore, we discuss an algorithm of reconstruction of Pareto optimal trajectories.

16:30
Hafid Lalioui (Université Ibn Zohr, Equipe aide à la decision, ENSA, Agadir, Morocco, Morocco)
Brahim El Asri (Université Ibn Zohr, Equipe aide à la decision, ENSA, Agadir, Morocco, Morocco)
Two-player zero-sum deterministic differential game in infinite horizon involving impulse controls with a new QVI
PRESENTER: Hafid Lalioui

ABSTRACT. We study a two-player zero-sum deterministic differential game with both players adopting impulse controls in infinite horizon with costs functions depends on the state variable. We aim to prove that the impulse control problem admit a value. We prove by the mean of dynamic programming principle that the value functions are continuous and viscosity solutions of the corresponding Hamilton-Jacobi-Bellman-Isaacs (HJBI) quasi-variational inequality (QVI). We define a new differential QVI for which the value functions are the unique viscosity solution. We then prove that the lower and upper value functions coincide.

17:00
Hamza Ennaji (Limoges University, France)
Primal-Dual algorithm for Lambertian Shape from Shading

ABSTRACT. Shape from Shading is a classical problem in compute vision. It consists in reconstructing the 3D shape of an object from a 2D image. A PDE formulation of this problem gives rise to a PDE of the form $H(x,Du) = 0$ on $\Omega$ where $\Omega$ is a domain of $\mathbb{R}^d$ and $H:\Omega\times\mathbb{R}^d\to\mathbb{R}$ is the Hamiltonian. To every Hamiltonian one can associate a Finsler metric $\sigma$ which characterise the set of all subsolutions of HJ equation via its dual $\sigma^*$. We show \cite{preprint} that the maximal viscosity subsolution is also solution of a maximization problem of the form \[ \max_{u} \{ \int_{\Omega} u dx:~\sigma^{*}(x,\nabla u) \leq 1~+~\hbox{boundary conditions}. \} \] Using duality, we end up with a problem of the form \begin{equation} \label{SP} \inf_{u}\sup_{\phi} \mathcal{F}(u) + \langle\nabla u, \phi\rangle - \mathcal{G}^{*}(\phi) \end{equation}

for appropriate functionals $\mathcal{F}$ and $\mathcal{G}$. The problem \eqref{SP} falls into the scope of \cite{cp} and thus offers an alternative to the commonly used approaches to tackle the SfS problem.

16:00-17:30 Session MA22: Numerical Optimization/Sochastic Gradient method 2
Chairs:
Marianne Akian (INRIA and CMAP, CNRS, Ecole polytechnique, IPP, France)
Vincent Leclere (ENPC, France)
Location: Room 2
16:00
François Pacaud (Artelys, France)
Interior point methods for logistic regression

ABSTRACT. Logistic regression is a well known statistical model used to predict binary outcomes. Fitting logistic regression models reduces to the minimization of an empirical loss penalizing the gap between the values predicted by the statistical model and the observations. The corresponding optimization problem formulates naturally as a (unconstrained) non-linear problem. Then, the numerical resolution of the problem often fallbacks to a first order method. However, if the size of the dataset is not too large, second order methods also have proven to be effective. In this talk, we study the resolution of logistic regression problems with a commercial interior point solver, Artelys Knitro. We first provide numerical studies showing that Knitro behaves well to fit logistic regression models, both with $\ell_1$ and $\ell_2$ penalties. We compare Knitro with the solver L-BFGS-B, currently used in the library Scikit-Learn. In the second part of the talk, we focus on the optimization of regularization hyperparameters. We suppose given a cross-validation evaluation function, based onto a validation dataset. By using the implicit function theorem, we derive the sensivity of the cross-validation loss function w.r.t. the penalty term, thus allowing to optimize it with a descent algorithm based on BFGS. The devised algorithm is faster than classical methods such as grid searches, and outperforms the finite-difference algorithm implemented inside Knitro. Eventually, we depict some ideas to improve interior point algorithms for the resolution of optimization problems arising in machine learning.

16:30
Stephane Chretien (National Physical Laboratory and Laboratoire ERIC, Universite Lyon 2, UK)
A stochastic Lanvegin sampling approach to Median of Means estimation

ABSTRACT. Big Data are at the heart of all substantial progress in current machine learning with impact on self driving cars, softwares outperforming human efficiency standards, decision making in medicine, law, chess or Go. Nevertheless, data from real-world experiments oftentimes tend to be corrupted with outliers and/or exhibit heavy tails. For instance, data from finance, often suffer from heavy-tailed phenomena. Also, gene expression data are usually not clean and may be corrupted by the presence of outliers. Fraud detection is another form of outlier detection.

In order to address these challenging data analytics problems, there is an urgent need for methods that automatically correct for such undesirable phenomena. This program was initiated by Catoni for the estimation of the mean on the real line and Audibert and Catoni in the least-squares regression framework. To the best of our knowledge, estimation of the mean of a real valued random variable is still best performed using the Median-of-Means (MoM) approach. In regression models, Median of Means can handle outliers or heavy tails in the covariates as well as in the reponse.

In the present work, we propose a new scalable approach to MoM estimation based on resampling at each iteration of a stochastic Langevin sampler. We prove polynomial time convergence with high probability based on hitting time analysis.

17:00
El Houcine Bergou (INRA, France)
Youssef Diouane (ISAE-SUPAERO, France)
Vyacheslav Kungurtsev (Czech Technical University in Prague, Czechia)
Clément Royer (Université Paris-Dauphine, France)
A stochastic Levenberg-Marquardt Method Using Random Models
PRESENTER: Clément Royer

ABSTRACT. Globally convergent variants of the Gauss-Newton algorithm are often the preferred methods to tackle nonlinear least squares problems. Among such frameworks, Levenberg-Marquardt and trust-region methods are two well-established, similar paradigms, with analyses that are often close in spirit. Both schemes have been successfully studied when the Gauss-Newton model is replaced by a random model, only accurate with a given probability. Meanwhile, problems where even the objective value is subject to noise have gained interest, driven by the need for efficient methods in fields such as data assimilation.

In this talk, we describe a stochastic Levenberg-Marquardt algorithm that handles noisy objective function values and random models, provided sufficient accuracy is achieved in probability. In particular, if the probability of accurate function estimates and models is sufficiently large, we establish that the proposed algorithm converges globally to a first-order stationary point of the problem with probability one. Furthermore, we bound the expected number of iterations needed to reach an approximate stationary point. Our method relies on both a specific scaling of the regularization parameter and a measure of criticality tailored to least-squares problems. We finally present an application of our method to variational data assimilation, where stochasticity arises from the so-called ensemble methods.