TBQ(σ): Improving Efficiency of Trace Utilization for Off-Policy Reinforcement Learning.
Longxiang ShiShijian LiLongbing CaoLong YangGang PanPublished in: AAMAS (2019)
Keyphrases
- reinforcement learning
- function approximation
- state space
- information retrieval
- dynamic programming
- optimal policy
- markov decision processes
- robotic control
- database
- reinforcement learning algorithms
- temporal difference
- action selection
- model free
- high efficiency
- multi agent
- multiscale
- case study
- web services
- data sets