Learning parameterized policies for Markov decision processes through demonstrations.
Manjesh Kumar HanawalHao LiuHenghui ZhuIoannis Ch. PaschalidisPublished in: CDC (2016)
Keyphrases
- markov decision processes
- reinforcement learning
- optimal policy
- real time dynamic programming
- finite state
- stochastic games
- macro actions
- state abstraction
- decision processes
- learning algorithm
- transition matrices
- model based reinforcement learning
- markov decision process
- partially observable
- dynamic programming
- state space
- learning tasks
- supervised learning
- policy iteration
- finite horizon
- average cost
- average reward
- planning under uncertainty
- reward function
- discounted reward
- markov games