Policy Shaping in Domains with Multiple Optimal Policies: (Extended Abstract).
Himanshu SahniBrent HarrisonKaushik SubramanianThomas CederborgCharles L. Isbell Jr.Andrea ThomazPublished in: AAMAS (2016)
Keyphrases
- optimal policy
- extended abstract
- markov decision processes
- finite horizon
- decision problems
- state space
- multistage
- long run
- reinforcement learning
- dynamic programming
- average reward
- infinite horizon
- markov decision process
- state dependent
- control policies
- finite state
- markov decision problems
- sufficient conditions
- dynamic programming algorithms
- serial inventory systems
- partially observable markov decision processes
- policy iteration
- average cost
- reward function
- initial state
- asymptotically optimal
- lost sales
- expected cost
- bayesian reinforcement learning
- data mining