Optimistic Regret Bounds for Online Learning in Adversarial Markov Decision Processes.
Sang Bin MoonAbolfazl HashemiPublished in: CoRR (2024)
Keyphrases
- markov decision processes
- online learning
- regret bounds
- optimal policy
- finite state
- state space
- online convex optimization
- reinforcement learning
- policy iteration
- transition matrices
- e learning
- infinite horizon
- dynamic programming
- multi agent
- average reward
- decision theoretic planning
- lower bound
- linear regression
- active learning
- average cost
- markov decision process
- long run
- machine learning
- fixed point
- linear programming
- worst case
- upper bound
- probabilistic model