Slowly Changing Adversarial Bandit Algorithms are Provably Efficient for Discounted MDPs.
Ian A. KashLev ReyzinZishun YuPublished in: CoRR (2022)
Keyphrases
- markov decision processes
- data structure
- computationally efficient
- computationally intensive
- highly efficient
- computationally expensive
- reinforcement learning
- computational cost
- worst case
- factored mdps
- policy evaluation
- average reward
- data mining
- data mining techniques
- policy iteration
- average cost
- markov chain
- least squares
- state space
- significant improvement
- computational complexity
- multi agent
- decision trees
- learning algorithm