A Geometric Approach to Find Nondominated Policies to Imprecise Reward MDPs.
Valdinei Freire da SilvaAnna Helena Reali CostaPublished in: ECML/PKDD (1) (2011)
Keyphrases
- reward function
- markov decision processes
- minimax regret
- optimal policy
- reinforcement learning
- expected reward
- discounted reward
- average reward
- total reward
- markov decision process
- policy search
- stationary policies
- state space
- markov decision problems
- reinforcement learning algorithms
- inverse reinforcement learning
- finite horizon
- long run
- policy iteration
- partially observable
- dynamic programming
- transition probabilities
- multiple agents
- control policies
- factored mdps
- action space
- decision problems
- hierarchical reinforcement learning
- finite state
- semi markov decision processes
- action sets
- average cost
- control policy
- temporal difference
- state variables
- planning under uncertainty
- uncertain information
- multistage