Q-Learning for MDPs with General Spaces: Convergence and Near Optimality via Quantization under Weak Continuity.
Ali Devran KaraNaci SaldiSerdar YükselPublished in: J. Mach. Learn. Res. (2023)
Keyphrases
- reinforcement learning
- stochastic shortest path
- state space
- markov decision processes
- special case
- optimal policy
- stochastic approximation
- policy iteration
- dynamic programming
- reinforcement learning algorithms
- learning rate
- learning algorithm
- average reward
- sufficient conditions
- reward function
- action selection
- decision theoretic
- multi agent
- cooperative