Policy Evaluation and Temporal-Difference Learning in Continuous Time and Space: A Martingale Approach.
Yanwei JiaXun Yu ZhouPublished in: CoRR (2021)
Keyphrases
- temporal difference learning
- temporal difference
- policy evaluation
- function approximation
- reinforcement learning
- policy iteration
- monte carlo
- model free
- evaluation function
- reinforcement learning algorithms
- markov decision processes
- fixed point
- markov chain
- step size
- game playing
- least squares
- state space
- action selection
- function approximators
- action space
- optimal control
- training data
- markov decision process
- search space
- decision making
- dynamical systems
- support vector machine