Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability.
Dibya GhoshJad RahmeAviral KumarAmy ZhangRyan P. AdamsSergey LevinePublished in: NeurIPS (2021)
Keyphrases
- partial observability
- reinforcement learning
- partially observable markov decision processes
- partially observable
- belief state
- belief space
- state space
- markov decision process
- fully observable
- markov decision processes
- planning problems
- decision problems
- partial information
- finite state
- dynamical systems
- optimal policy
- learning agent
- policy gradient
- planning under uncertainty
- dynamic programming
- learning algorithm
- planning under partial observability
- model free
- infinite horizon
- reinforcement learning algorithms
- markov decision problems
- function approximation
- probability distribution
- machine learning