Minimax Regret Bounds for Reinforcement Learning.
Mohammad Gheshlaghi AzarIan OsbandRémi MunosPublished in: CoRR (2017)
Keyphrases
- minimax regret
- reinforcement learning
- reward function
- preference elicitation
- utility function
- decision problems
- upper bound
- lower bound
- state space
- stochastic programming
- optimal policy
- worst case
- markov decision processes
- misclassification costs
- multi agent
- learning algorithm
- transfer learning
- multi class
- machine learning
- data sets
- dynamic programming
- cost sensitive
- learning process