Discrete-time counterparts of the RL and RC multipliers.
Shuai WangWilliam Paul HeathJoaquín CarrascoPublished in: Int. J. Control (2020)
Keyphrases
- reinforcement learning
- markov chain
- floating point
- markov processes
- function approximation
- optimal policy
- finite state
- autonomous learning
- state space
- semi markov decision processes
- machine learning
- optimal control problems
- loss probability
- control policy
- optimal control
- multi agent
- action selection
- continuous domains
- action space
- real valued
- rl algorithms
- dynamic programming
- learning process
- learning algorithm