Near-Optimal Reward-Free Exploration for Linear Mixture MDPs with Plug-in Solver.
Xiaoyu ChenJiachen HuLin F. YangLiwei WangPublished in: CoRR (2021)
Keyphrases
- reinforcement learning
- markov decision processes
- average reward
- reward function
- model based reinforcement learning
- mixture model
- state space
- least squares
- optimal policy
- gaussian mixture model
- learning algorithm
- long run
- linear systems
- search algorithm
- expectation maximization
- action selection
- policy search
- factored mdps