Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory.
Yufeng ZhangQi CaiZhuoran YangYongxin ChenZhaoran WangPublished in: NeurIPS (2020)
Keyphrases
- temporal difference
- function approximation
- reinforcement learning
- td learning
- function approximators
- model free
- reinforcement learning algorithms
- temporal difference learning
- action selection
- temporal difference methods
- evaluation function
- policy iteration
- step size
- state space
- td methods
- actor critic
- policy evaluation
- learning algorithm
- learning tasks
- markov random field
- multi agent
- state action
- machine learning
- monte carlo
- reinforcement learning methods
- decision trees
- multiscale
- learning agent
- markov chain
- optimal policy