Discrete-time counterparts of the RL and RC multipliers.
Shuai WangWilliam Paul HeathJoaquín CarrascoPublished in: CDC (2017)
Keyphrases
- reinforcement learning
- markov chain
- finite state
- state space
- multi agent
- markov processes
- reinforcement learning algorithms
- function approximation
- markov decision processes
- model free
- action selection
- floating point
- learning classifier systems
- optimal policy
- learning process
- complex domains
- markov decision process
- policy iteration
- action space
- learning agents
- autonomous learning
- partially observable domains