ISL: Optimal Policy Learning With Optimal Exploration-Exploitation Trade-Off.
Lucas CassanoAli H. SayedPublished in: CoRR (2019)
Keyphrases
- optimal policy
- reinforcement learning
- average reward reinforcement learning
- dynamic programming
- state space
- decision problems
- finite horizon
- optimal solution
- markov decision processes
- long run
- state dependent
- multistage
- infinite horizon
- learning algorithm
- graphical models
- average cost
- stochastic demand
- bayesian reinforcement learning