Offline Reinforcement Learning via Policy Regularization and Ensemble Q-Functions.
Tao WangShaorong XieMingke GaoXue ChenZhenyu ZhangHang YuPublished in: ICTAI (2022)
Keyphrases
- reinforcement learning
- optimal policy
- policy search
- projection operator
- action selection
- markov decision process
- learning algorithm
- actor critic
- ensemble learning
- control policy
- reinforcement learning problems
- partially observable environments
- real time
- state and action spaces
- ensemble methods
- function approximation
- partially observable
- reinforcement learning algorithms
- neural network
- action space
- partially observable domains
- training set
- control policies
- machine learning
- learning process
- continuous state spaces
- state space
- approximate dynamic programming
- model free
- basis functions
- markov decision processes
- decision problems
- partially observable markov decision processes
- random forest
- function approximators
- state action
- markov decision problems
- policy gradient
- infinite horizon
- reward function
- training data
- decision trees
- feature selection
- temporal difference