A distributional code for value in dopamine-based reinforcement learning.
Will DabneyZeb Kurth-NelsonNaoshige UchidaClara Kwon StarkweatherDemis HassabisRémi MunosMatthew BotvinickPublished in: Nat. (2020)
Keyphrases
- reinforcement learning
- temporal difference
- source code
- basal ganglia
- function approximation
- reinforcement learning algorithms
- model free
- action selection
- learning algorithm
- semi markov
- markov decision processes
- supervised learning
- multi agent
- robotic control
- dynamic programming
- state space
- optimal policy
- learning problems
- optimal control
- transfer learning
- learning process
- learning environment
- co occurrence
- learning capabilities
- robot control
- function approximators
- error correcting
- reinforcement learning methods
- java programs
- information systems
- autonomous learning
- data sets