Improved Regret Bound and Experience Replay in Regularized Policy Iteration.
Nevena LazicDong YinYasin Abbasi-YadkoriCsaba SzepesváriPublished in: CoRR (2021)
Keyphrases
- policy iteration
- least squares
- markov decision processes
- model free
- reinforcement learning
- linear regression
- fixed point
- optimal policy
- dynamic programming
- online learning
- markov decision process
- finite state
- objective function
- temporal difference
- evaluation function
- regret bounds
- decision trees
- average cost
- probabilistic model
- active learning
- linear programming
- state space