Randomized Exploration is Near-Optimal for Tabular MDP.
Zhihan XiongRuoqi ShenSimon S. DuPublished in: CoRR (2021)
Keyphrases
- markov decision processes
- exploration strategy
- markov decision process
- optimal policy
- utility function
- reinforcement learning
- decision forest
- action selection
- linear program
- search strategies
- finite state
- active exploration
- linear programming
- dynamic programming
- machine learning
- provably near optimal
- controlled tabular adjustment
- dynamical systems
- information systems