PROTO: Iterative Policy Regularized Offline-to-Online Reinforcement Learning.
Jianxiong LiXiao HuHaoran XuJingjing LiuXianyuan ZhanYa-Qin ZhangPublished in: CoRR (2023)
Keyphrases
- reinforcement learning
- optimal policy
- real time
- policy search
- markov decision process
- action selection
- online learning
- policy evaluation
- function approximation
- policy iteration
- control problems
- partially observable
- reinforcement learning algorithms
- state space
- markov decision processes
- least squares
- approximate dynamic programming
- exploration exploitation tradeoff
- reinforcement learning problems
- function approximators
- partially observable environments
- action space
- asymptotically optimal
- control policy
- average reward
- state action
- temporal difference learning
- control policies
- policy gradient
- infinite horizon
- online communities
- decision problems
- transfer learning
- learning algorithm
- neural network