Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning.
Prasenjit KarmakarShalabh BhatnagarPublished in: Math. Oper. Res. (2018)
Keyphrases
- stochastic approximation
- temporal difference learning
- function approximation
- fixed point
- reinforcement learning
- evaluation function
- temporal difference
- game playing
- policy iteration
- reinforcement learning algorithms
- monte carlo
- markov decision process
- markov chain
- function approximators
- sufficient conditions
- optimal policy
- graph cuts
- state space
- neural network