Convergence and stability analysis of value iteration Q-learning under non-discounted cost for discrete-time optimal control.
Shijie SongMingming ZhaoDawei GongMinglei ZhuPublished in: Neurocomputing (2024)
Keyphrases
- optimal control
- stability analysis
- average cost
- infinite horizon
- stochastic shortest path
- markov decision processes
- optimal control problems
- dynamic programming
- policy iteration
- markov decision problems
- reinforcement learning
- optimal policy
- nonlinear systems
- control law
- state space
- finite state
- linear quadratic
- finite horizon
- control problems
- feedback control
- markov decision process
- risk sensitive
- control strategy
- reinforcement learning algorithms
- sliding mode
- production planning
- average reward
- lyapunov function
- actor critic
- markov chain
- function approximation
- partially observable markov decision processes
- inventory level
- learning algorithm
- total cost
- model free
- adaptive control
- learning rate
- neural network
- convergence speed