Reward-Mixing MDPs with a Few Latent Contexts are Learnable.
Jeongyeol KwonYonathan EfroniConstantine CaramanisShie MannorPublished in: CoRR (2022)
Keyphrases
- reinforcement learning
- markov decision processes
- average reward
- reward function
- expected reward
- discounted reward
- optimal policy
- state space
- long run
- latent variables
- factored mdps
- finite horizon
- reinforcement learning algorithms
- learning algorithm
- function approximation
- dynamic programming
- average cost
- decision theoretic planning
- markov decision process
- policy iteration
- model free
- semi markov decision processes
- pac learning
- temporal difference
- inverse reinforcement learning
- multi agent
- factored markov decision processes