Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions
Yasin Abbasi-YadkoriPeter L. BartlettCsaba SzepesváriPublished in: CoRR (2013)
Keyphrases
- markov decision processes
- online learning
- probability distribution
- finite state
- state space
- optimal policy
- policy iteration
- reinforcement learning
- random variables
- e learning
- state transition
- dynamic programming
- reachability analysis
- bayesian networks
- action space
- decision theoretic planning
- partially observable
- markov decision process
- factored mdps
- average cost
- reinforcement learning algorithms
- infinite horizon
- transition matrices
- model based reinforcement learning
- decision processes
- average reward
- planning under uncertainty
- risk sensitive
- state and action spaces
- action sets
- stochastic processes
- state abstraction
- finite horizon
- data mining
- semi markov decision processes
- real time dynamic programming
- markov chain
- active learning