Reward-Mixing MDPs with Few Latent Contexts are Learnable.
Jeongyeol KwonYonathan EfroniConstantine CaramanisShie MannorPublished in: ICML (2023)
Keyphrases
- reinforcement learning
- markov decision processes
- average reward
- reward function
- state space
- optimal policy
- discounted reward
- long run
- expected reward
- latent variables
- finite state
- learning algorithm
- machine learning
- function approximation
- planning under uncertainty
- markov decision process
- decision theoretic planning
- inverse reinforcement learning
- reinforcement learning algorithms
- average cost
- partially observable
- markov chain
- finite horizon
- learning agent
- action space
- pac learning
- model free
- stationary policies
- factored mdps
- learning from positive data
- transition probabilities
- dynamical systems
- least squares