Learning Adversarial Markov Decision Processes with Delayed Feedback.
Tal LancewickiAviv RosenbergYishay MansourPublished in: CoRR (2020)
Keyphrases
- markov decision processes
- reinforcement learning
- state space
- model based reinforcement learning
- stochastic games
- delayed feedback
- optimal policy
- policy iteration
- learning algorithm
- real time dynamic programming
- partially observable
- finite state
- model free
- reinforcement learning algorithms
- infinite horizon
- learning agent
- learning tasks
- state abstraction
- supervised learning
- transition matrices
- dynamic programming
- machine learning