Tags:Decision theory under risk, Markov decision processes, Preference modeling, Preference modeling. and Sequential decision making
Abstract:
In this work, we study risk aware sequential decision making in a Markov Decision Process (MDP). Unlike many works in the literature, where MDPs are solved by optimizing expected rewards (ER), and thus assuming neutrality w.r.t. risk, we use a more sophisticated operator: the Weighted Ordered Weighted Average (WOWA), a parameterized operator that allows to model a wide range of behaviors, from extreme risk seeking to extreme risk aversion (as well as compromises between both behaviors). This operator has thus a high descriptive capacity, but is rather difficult to optimize in an MDP because of its non-linearity that makes standard solving algorithms sub-optimal. In this paper, we introduce and justify a ranking algorithm that allows to determine an optimal (or nearly optimal) policy for a wide range of attitudes w.r.t. risk (averse, seeking, neutral, intermediate) using WOWA. Empirical results are given to illustrate the relevance and the efficiency of the approach.