Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes.
Nathan KallusMasatoshi UeharaPublished in: CoRR (2019)
Keyphrases
- markov decision processes
- policy evaluation
- reinforcement learning
- policy iteration
- least squares
- reinforcement learning algorithms
- optimal policy
- temporal difference
- state space
- finite state
- dynamic programming
- model free
- function approximation
- monte carlo
- average reward
- variance reduction
- planning under uncertainty
- average cost
- markov decision process
- infinite horizon
- partially observable
- action space
- stochastic games
- learning algorithm
- decision making
- reward function
- multi agent
- decision processes
- function approximators
- state and action spaces