Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition.
Chi JinTiancheng JinHaipeng LuoSuvrit SraTiancheng YuPublished in: ICML (2020)
Keyphrases
- markov decision processes
- reinforcement learning
- finite state
- state space
- learning algorithm
- partially observable
- supervised learning
- stochastic games
- state abstraction
- policy iteration
- finite horizon
- decision theoretic planning
- transition matrices
- markov chain
- optimal policy
- dynamic programming
- data mining
- state and action spaces
- macro actions
- model based reinforcement learning