Login / Signup
Approximate value iteration with randomized policies.
Daniela Pucci de Farias
Benjamin Van Roy
Published in:
CDC (2000)
Keyphrases
</>
approximate value iteration
fixed point
temporal difference learning
loss bounds
optimal policy
evaluation function
markov decision process
reinforcement learning
function approximation
markov decision problems
neural network
dynamic programming
temporal difference