Convergence of Finite Memory Q-Learning for POMDPs and Near Optimality of Learned Policies under Filter Stability.
Ali Devran KaraSerdar YükselPublished in: CoRR (2021)
Keyphrases
- stationary policies
- optimal policy
- markov decision processes
- reinforcement learning
- state space
- partially observable markov decision processes
- average cost
- average reward
- markov decision process
- markov decision problems
- stochastic shortest path
- policy iteration
- finite state
- state and action spaces
- dynamic programming
- reinforcement learning algorithms
- stochastic approximation
- partially observable
- reward function
- discounted reward
- infinite horizon
- policy search
- continuous state spaces
- belief state
- finite number
- decision problems
- memory requirements
- sufficient conditions
- continuous state
- function approximation
- multi agent
- long run
- learning algorithm
- control policies
- finite horizon
- state information
- dynamical systems
- action space
- initial state
- predictive state representations
- state action
- hierarchical reinforcement learning
- stochastic games
- temporal difference
- model free
- expected reward
- markov chain
- multiclass queueing networks
- convergence speed
- multi agent reinforcement learning
- optimality criterion
- linear programming
- optimal solution