On the Reuse Bias in Off-Policy Reinforcement Learning.
Chengyang YingZhongkai HaoXinning ZhouHang SuDong YanJun ZhuPublished in: CoRR (2022)
Keyphrases
- reinforcement learning
- function approximation
- learning algorithm
- reinforcement learning algorithms
- dynamic programming
- markov decision processes
- real time
- state space
- data mining
- robotic control
- multi agent reinforcement learning
- software reuse
- temporal difference
- optimal control
- machine learning
- temporal difference learning
- stochastic approximation
- case study
- relational reinforcement learning
- direct policy search
- action selection
- evolutionary algorithm
- partially observable
- expert systems