Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning.
Siyuan ZhangNan JiangPublished in: CoRR (2021)
Keyphrases
- reinforcement learning
- optimal policy
- action selection
- markov decision process
- policy search
- reward function
- markov decision processes
- actor critic
- function approximation
- model selection
- gaussian processes
- reinforcement learning algorithms
- partially observable environments
- partially observable
- markov decision problems
- real time
- partially observable domains
- learning algorithm
- gaussian process
- selection algorithm
- optimal control
- reinforcement learning problems
- action space
- control policy
- state space
- rl algorithms
- control policies
- machine learning
- inverse reinforcement learning
- approximate dynamic programming
- function approximators
- policy iteration
- policy evaluation
- decision problems
- transition model
- state action
- finite state
- infinite horizon