A Sliding-Window Algorithm for Markov Decision Processes with Arbitrarily Changing Rewards and Transitions.
Pratik GajaneRonald OrtnerPeter AuerPublished in: CoRR (2018)
Keyphrases
- markov decision processes
- sliding window
- dynamic programming
- boyer moore
- model based reinforcement learning
- fixed size
- window size
- pattern matching
- reinforcement learning
- state space
- linear programming
- learning algorithm
- optimal solution
- data streams
- np hard
- average reward
- finite state
- real time dynamic programming
- objective function
- compact data structure
- least squares
- total reward
- state and action spaces
- decision theoretic planning
- continuous state spaces
- policy iteration
- search space
- optimal policy