On the Reuse Bias in Off-Policy Reinforcement Learning.

Chengyang Ying Zhongkai Hao Xinning Zhou Hang Su Dong Yan Jun Zhu

Published in: IJCAI (2023)

Keyphrases

reinforcement learning
function approximation
learning algorithm
learning objects
software reuse
reinforcement learning algorithms
temporal difference
machine learning
learning process
state space
multi agent reinforcement learning
robotic control
learning problems
multi agent
temporal difference learning
learning classifier systems
direct policy search
reinforcement learning methods
fitted q iteration
function approximators
robot control
action selection
real time
knowledge management
trade off
data mining