Policy Evaluation and Temporal-Difference Learning in Continuous Time and Space: A Martingale Approach.
Yanwei JiaXun Yu ZhouPublished in: J. Mach. Learn. Res. (2022)
Keyphrases
- temporal difference learning
- temporal difference
- policy evaluation
- function approximation
- reinforcement learning
- policy iteration
- evaluation function
- monte carlo
- model free
- least squares
- fixed point
- reinforcement learning algorithms
- markov decision processes
- state space
- function approximators
- action selection
- game playing
- search space
- markov chain
- optimal control
- markov decision process
- step size
- machine learning
- supervised learning
- optimal policy
- sufficient conditions