Training cascaded networks for speeded decisions using a temporal-difference loss.
Michael L. IuzzolinoMichael C. MozerSamy BengioPublished in: CoRR (2021)
Keyphrases
- temporal difference
- function approximation
- reinforcement learning
- td learning
- evaluation function
- supervised learning
- model free
- monte carlo
- step size
- decision makers
- action selection
- decision making
- temporal difference learning
- reinforcement learning algorithms
- temporal difference methods
- training data
- multiscale
- neural network