On the Reuse Bias in Off-Policy Reinforcement Learning.

Chengyang Ying Zhongkai Hao Xinning Zhou Hang Su Dong Yan Jun Zhu

Published in: CoRR (2022)

Keyphrases

reinforcement learning
function approximation
learning algorithm
reinforcement learning algorithms
dynamic programming
markov decision processes
real time
state space
data mining
robotic control
multi agent reinforcement learning
software reuse
temporal difference
optimal control
machine learning
temporal difference learning
stochastic approximation
case study
relational reinforcement learning
direct policy search
action selection
evolutionary algorithm
partially observable
expert systems