Login / Signup
Dynamic Regret of Online Markov Decision Processes.
Peng Zhao
Longfei Li
Zhi-Hua Zhou
Published in:
ICML (2022)
Keyphrases
</>
markov decision processes
online learning
reinforcement learning
finite state
optimal policy
state space
reward function
total reward
transition matrices
policy iteration
decision theoretic planning
dynamic programming
planning under uncertainty
dynamic environments
reachability analysis
factored mdps
reinforcement learning algorithms
partially observable
expected reward
average reward
lower bound
finite horizon
markov decision process
model based reinforcement learning
average cost
infinite horizon
action sets
loss function
online convex optimization
state and action spaces
decision processes
action space
temporal difference
multistage
objective function