Stratified Experience Replay: Correcting Multiplicity Bias in Off-Policy Reinforcement Learning.

Brett Daley Cameron Hickert Christopher Amato

Published in: CoRR (2021)

Keyphrases

reinforcement learning
function approximation
decision making
markov decision processes
data sets
learning algorithm
objective function
state space
temporal difference learning
real time
temporal difference
model free
learning problems
optimal policy
dynamic programming
learning process
trade off
expert systems
machine learning
neural network