Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes.
Chenlu YeWei XiongQuanquan GuTong ZhangPublished in: ICML (2023)
Keyphrases
- markov decision processes
- policy iteration
- factored mdps
- optimal policy
- reachability analysis
- learning algorithm
- reinforcement learning
- computational complexity
- dynamic programming
- state space
- finite state
- transition matrices
- data mining
- infinite horizon
- partially observable markov decision processes
- decision theoretic planning