Counterfactual Optimism: Rate Optimal Regret for Stochastic Contextual MDPs.
Orin LevyAsaf B. CasselAlon CohenYishay MansourPublished in: CoRR (2022)
Keyphrases
- regret bounds
- markov decision processes
- worst case
- reinforcement learning
- dynamic programming
- optimal solution
- stochastic dynamic programming
- finite horizon
- state space
- total reward
- upper bound
- average cost
- average reward
- lower bound
- contextual information
- monte carlo
- estimation error
- logical framework
- approximate dynamic programming
- stochastic domains