Optimistic Regret Bounds for Online Learning in Adversarial Markov Decision Processes.

Sang Bin Moon Abolfazl Hashemi

Published in: CoRR (2024)

Keyphrases

markov decision processes
online learning
regret bounds
optimal policy
finite state
state space
online convex optimization
reinforcement learning
policy iteration
transition matrices
e learning
infinite horizon
dynamic programming
multi agent
average reward
decision theoretic planning
lower bound
linear regression
active learning
average cost
markov decision process
long run
machine learning
fixed point
linear programming
worst case
upper bound
probabilistic model