Improved Regret Bound and Experience Replay in Regularized Policy Iteration.
Nevena LazicDong YinYasin Abbasi-YadkoriCsaba SzepesváriPublished in: ICML (2021)
Keyphrases
- policy iteration
- markov decision processes
- least squares
- reinforcement learning
- model free
- optimal policy
- fixed point
- infinite horizon
- linear regression
- learning algorithm
- markov decision process
- average cost
- finite state
- optimal control
- belief propagation
- online learning
- graphical models
- active learning
- regret bounds
- finite number
- linear programming
- machine learning
- theoretical guarantees