Twenty Questions with Noise: Bayes Optimal Policies for Entropy Loss.
Bruno JedynakPeter I. FrazierRaphael SznitmanPublished in: J. Appl. Probab. (2012)
Keyphrases
- optimal policy
- markov decision processes
- decision problems
- state space
- dynamic programming
- finite horizon
- reinforcement learning
- long run
- finite state
- infinite horizon
- state dependent
- dynamic programming algorithms
- average cost
- markov decision process
- average reward reinforcement learning
- multistage
- initial state
- average reward
- control policies
- partially observable markov decision processes
- sufficient conditions
- serial inventory systems
- machine learning
- bayesian reinforcement learning
- reinforcement learning algorithms