Is the Bellman residual a bad proxy?
Matthieu GeistBilal PiotOlivier PietquinPublished in: NIPS (2017)
Keyphrases
- bellman residual
- sample path
- approximation methods
- least squares
- policy iteration
- fixed point
- optimization criterion
- asymptotic analysis
- markov decision processes
- markov chain
- policy evaluation
- hybrid algorithms
- model free
- finite state
- optimal policy
- markov decision process
- large deviations
- markov decision problems
- objective function