Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration.
Yao LiuAdith SwaminathanAlekh AgarwalEmma BrunskillPublished in: NeurIPS (2020)
Keyphrases
- reinforcement learning
- active exploration
- exploration strategy
- exploration exploitation
- action selection
- function approximation
- batch mode
- model based reinforcement learning
- model free
- exploration exploitation tradeoff
- worst case
- learning algorithm
- state space
- reinforcement learning algorithms
- machine learning
- autonomous learning
- temporal difference
- temporal difference learning
- batch size
- multi agent
- optimal policy
- neural network
- information visualization
- robot control
- optimal control
- transition model
- relevance feedback
- dynamic programming