No-regret learning with high-probability in adversarial Markov decision processes.
Mahsa GhasemiAbolfazl HashemiHaris VikaloUfuk TopcuPublished in: UAI (2021)
Keyphrases
- markov decision processes
- reinforcement learning
- stochastic games
- learning algorithm
- finite state
- learning tasks
- state space
- partially observable
- transition matrices
- model based reinforcement learning
- optimal policy
- supervised learning
- dynamic programming
- reward function
- policy iteration
- multi agent
- optimal control
- markov chain
- planning under uncertainty
- continuous state spaces
- state abstraction
- decision theoretic planning
- data mining