Doubly Robust Estimator for Off-Policy Evaluation with Large Action Spaces.
Tatsuhiro ShimizuLaura ForastierePublished in: SSCI (2023)
Keyphrases
- action space
- markov decision processes
- policy iteration
- temporal difference
- reinforcement learning
- action selection
- state space
- markov decision process
- model free
- optimal policy
- dynamic programming
- real valued
- markov decision problems
- finite state
- function approximation
- reinforcement learning algorithms
- infinite horizon
- least squares
- average cost
- stochastic processes
- variance reduction
- partially observable
- dynamical systems
- markov chain
- multi agent