TDprop: Does Jacobi Preconditioning Help Temporal Difference Learning?
Joshua RomoffPeter HendersonDavid KanaaEmmanuel BengioAhmed TouatiPierre-Luc BaconJoelle PineauPublished in: CoRR (2020)
Keyphrases
- temporal difference learning
- fixed point
- function approximation
- reinforcement learning
- approximate value iteration
- game playing
- evaluation function
- temporal difference
- reinforcement learning algorithms
- markov decision process
- monte carlo
- gaussian process
- policy iteration
- function approximators
- neural network
- state space
- sufficient conditions
- artificial neural networks
- objective function