Off-Policy Evaluation for Large Action Spaces via Embeddings.
Yuta SaitoThorsten JoachimsPublished in: CoRR (2022)
Keyphrases
- action space
- policy evaluation
- markov decision processes
- reinforcement learning
- policy iteration
- state space
- temporal difference
- markov decision problems
- optimal policy
- finite state
- action selection
- reinforcement learning algorithms
- model free
- least squares
- dynamic programming
- monte carlo
- average reward
- function approximation
- real valued
- average cost
- markov decision process
- state action
- function approximators
- infinite horizon
- partially observable
- decision making
- multi agent systems
- particle filter
- computational complexity
- planning problems