Off-Policy Evaluation in Partially Observable Environments.
Guy TennenholtzShie MannorUri ShalitPublished in: CoRR (2019)
Keyphrases
- partially observable environments
- policy evaluation
- reinforcement learning
- partially observable markov decision processes
- reinforcement learning algorithms
- temporal difference
- model free
- markov decision processes
- inverse reinforcement learning
- partially observable
- least squares
- finite state
- monte carlo
- function approximation
- policy iteration
- optimal policy
- dynamical systems
- dynamic programming
- state space
- variance reduction
- decision problems
- multi agent
- semi parametric
- evaluation function
- belief state
- markov decision problems
- planning problems
- reward function
- infinite horizon
- machine learning
- learning tasks
- graphical models
- supervised learning
- cost function
- multi agent systems
- learning algorithm