Online Learning for Unknown Partially Observable MDPs.
Mehdi Jafarnia-JahromiRahul JainAshutosh NayyarPublished in: AISTATS (2022)
Keyphrases
- partially observable
- online learning
- markov decision processes
- markov decision problems
- state space
- initially unknown
- decision problems
- reinforcement learning
- dynamical systems
- infinite horizon
- partial observability
- e learning
- reward function
- partial observations
- active learning
- optimal policy
- action models
- probabilistic planning
- partially observable environments
- finite state
- belief state
- average reward
- fully observable
- markov decision process
- dec pomdps
- optimal control
- control system
- computational complexity