Learning Adversarial Markov Decision Processes with Delayed Feedback.
Tal LancewickiAviv RosenbergYishay MansourPublished in: AAAI (2022)
Keyphrases
- least squares
- markov decision processes
- policy iteration
- reinforcement learning
- learning algorithm
- delayed feedback
- optimal policy
- model based reinforcement learning
- partially observable
- learning tasks
- reinforcement learning algorithms
- stochastic games
- state space
- function approximation
- infinite horizon
- supervised learning
- state abstraction
- factored mdps
- macro actions
- action sets