Loop estimator for discounted values in Markov reward processes.
Falcon Z. DaiMatthew R. WalterPublished in: CoRR (2020)
Keyphrases
- average reward
- reinforcement learning
- markov chain
- optimal policy
- markov decision processes
- confidence intervals
- least squares
- maximum likelihood
- user defined
- semi markov
- estimation algorithm
- markov model
- standard deviation
- maximum likelihood estimator
- model free
- infinite horizon
- long run
- neural network
- computational models
- image sequences