Login / Signup
Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs.
Han Zhong
Zhuoran Yang
Zhaoran Wang
Csaba Szepesvári
Published in:
CoRR (2021)
Keyphrases
</>
non stationary
finite horizon
optimal policy
adaptive algorithms
markov decision processes
markov decision process
reinforcement learning
autoregressive
partially observable
average reward
empirical mode decomposition
state space
average cost
markov decision problems
continuous state spaces