Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction.
Aviral KumarJustin FuMatthew SohGeorge TuckerSergey LevinePublished in: NeurIPS (2019)
Keyphrases
- error reduction
- reinforcement learning
- multi agent
- classification error
- learning algorithm
- iterative learning
- state space
- significant improvement
- classification accuracy
- semi supervised
- information extraction
- optimal policy
- feature selection
- dynamic programming
- data sets
- multi class
- facial expressions
- class labels
- class imbalance