Login / Signup
Near-Optimal Reward-Free Exploration for Linear Mixture MDPs with Plug-in Solver.
Xiaoyu Chen
Jiachen Hu
Lin Yang
Liwei Wang
Published in:
ICLR (2022)
Keyphrases
</>
reinforcement learning
markov decision processes
reward function
average reward
model based reinforcement learning
state space
search strategies
long run
finite horizon
dynamic programming
mixture model
optimal policy
markov decision process
policy search
expected reward