Robust Policy Computation in Reward-Uncertain MDPs Using Nondominated Policies.
Kevin ReganCraig BoutilierPublished in: AAAI (2010)
Keyphrases
- optimal policy
- reward function
- markov decision processes
- average reward
- reinforcement learning
- expected reward
- total reward
- discounted reward
- policy search
- markov decision process
- long run
- markov decision problems
- stationary policies
- minimax regret
- finite horizon
- state space
- reinforcement learning algorithms
- inverse reinforcement learning
- control policies
- infinite horizon
- decision problems
- dynamic programming
- finite state
- partially observable markov decision processes
- decision making
- discount factor
- partially observed
- reinforcement learning problems
- policy iteration
- decision processes
- transition probabilities
- state and action spaces
- initial state
- average cost
- partially observable
- multiple agents
- efficient computation
- approximate policy iteration
- action sets
- policy gradient
- factored mdps
- robust stability
- function approximation
- control policy
- dynamical systems
- sufficient conditions
- markov chain
- partially observable environments
- state action