Evolving policies for multi-reward partially observable markov decision processes (MR-POMDPs).
Harold SohYiannis DemirisPublished in: GECCO (2011)
Keyphrases
- partially observable markov decision processes
- reinforcement learning
- expected reward
- partially observable environments
- average reward
- optimal policy
- finite state
- policy gradient
- belief state
- dynamical systems
- continuous state
- planning under uncertainty
- total reward
- partial observability
- decision problems
- state space
- markov decision processes
- dynamic programming
- reward function
- multi agent
- belief space
- partially observable stochastic games
- stochastic domains
- sequential decision making problems
- planning problems
- partially observable
- policy search
- markov chain
- bayesian reinforcement learning
- incremental pruning
- partially observable markov decision process
- dec pomdps
- infinite horizon
- long run
- approximate solutions
- policy iteration algorithm
- partially observable markov
- control policies
- markov decision process
- learning algorithm
- reinforcement learning algorithms
- model free
- linear programming