International Journal of Computational Intelligence Research (IJCIR)

Volume 3, Number 1 (2007)


Reinforcement learning approaches for constrained MDPs

Peter Geibel
Institute of Cognitive Science, AI Group University of Osnabrück, Germany


Most reinforcement learning approaches consider Markov decision processes (MDPs) with a single criterion. In practical applications, however, we often have to deal with additional criteria, e.g. the energy consumed or the time spent during solving the main task. In this article, we will therefore consider Markov Decision Processes with two criteria. Each criterion is defined as the expected value of a cumulative return. The second criterion is subject to an inequality constraint. We will describe two new reinforcement learning approaches for solving such control problems, discuss their advantages and shortcomings, and present experimental results based on randomly generated MDPs.

Machine Learning, Reinforcement Learning, Dynamic Programming, Constraints.