Policy Iterations Without Selection Property

10 pagesPublished: October 11, 2018


In this paper, we propose a modified policy iterations algorithm which does not rely on the selection property. The selection property is the key argument to make improvements during policy iterations. Indeed, a new policy is computed as an optimal solution of a minimization problem. However, in some cases, it might be difficult to prove that an optimal solution exists. To overcome this issue, the new policy is computed as a guaranteed sub-optimal solution of the minimization problem. The good choice of the perturbation parameters preserves the advantages of the original policy iterations algorithm such as the computation of a post-fixed point at each step and the convergence to a fixed point.

Keyphrases: approximated and guaranteed optimal solutions, Policy iterations, verification

In: Matthieu Martel, Nasrine Damouche and Julien Alexandre Dit Sandretto (editors). TNC'18. Trusted Numerical Computations, vol 8, pages 1--10

