Stratified Experience Replay: Correcting Multiplicity Bias in Off-Policy Reinforcement Learning.
Brett DaleyCameron HickertChristopher AmatoPublished in: CoRR (2021)
Keyphrases
- reinforcement learning
- function approximation
- decision making
- markov decision processes
- data sets
- learning algorithm
- objective function
- state space
- temporal difference learning
- real time
- temporal difference
- model free
- learning problems
- optimal policy
- dynamic programming
- learning process
- trade off
- expert systems
- machine learning
- neural network