MILP based value backups in partially observed Markov decision processes (POMDPs) with very large or continuous action and observation spaces.
Rakshita AgrawalMatthew J. RealffJay H. LeePublished in: Comput. Chem. Eng. (2013)
Keyphrases
- markov decision processes
- partially observed
- partially observable markov decision processes
- continuous action
- action space
- continuous state
- finite state
- state space
- belief state
- optimal policy
- reinforcement learning
- policy search
- planning under uncertainty
- dynamic programming
- partially observable
- policy iteration
- infinite horizon
- linear program
- reinforcement learning algorithms
- decision processes
- dec pomdps
- markov decision process
- decision theoretic planning
- finite horizon
- control policies
- average cost
- decision problems
- dynamical systems
- state dependent
- stochastic games
- initial state
- policy gradient
- reward function
- optimal control
- linear programming