Offline Policy Optimization in RL with Variance Regularizaton.
Riashat IslamSamarth SinhaHomanga BharadhwajSamin Yeasar ArnobZhuoran YangAnimesh GargZhaoran WangLihong LiDoina PrecupPublished in: CoRR (2022)
Keyphrases
- optimal policy
- reinforcement learning
- global optimization
- optimization problems
- optimization algorithm
- actor critic
- action selection
- real time
- optimization method
- state space
- learning algorithm
- markov decision process
- policy iteration
- action space
- control policies
- policy gradient
- correlation coefficient
- average reward
- control policy
- infinite horizon
- neural network