Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings.
Ming YinYu-Xiang WangPublished in: NeurIPS (2021)
Keyphrases
- reinforcement learning
- model free
- dynamic programming
- function approximation
- average reward
- optimal control
- state space
- average reward reinforcement learning
- control policy
- markov decision processes
- total reward
- approximate dynamic programming
- reinforcement learning algorithms
- data sets
- learning process
- learning environment
- eligibility traces
- partially observable environments
- initially unknown
- reinforcement learning methods
- optimal strategy
- average cost
- partially observable
- reward function
- learning classifier systems
- transfer learning
- optimal policy
- supervised learning
- machine learning