Controller exploitation-exploration reinforcement learning architecture for computing near-optimal policies.
Erick AsiainJulio B. ClempnerAlexander S. PoznyakPublished in: Soft Comput. (2019)
Keyphrases
- reinforcement learning
- optimal policy
- exploration exploitation tradeoff
- policy iteration algorithm
- control policy
- control architecture
- optimal control
- exploration strategy
- function approximation
- policy search
- action selection
- active exploration
- control policies
- markov decision process
- real time
- management system
- finite state
- control system
- markov decision processes
- hierarchical reinforcement learning
- provably near optimal
- reward function
- learning algorithm
- reinforcement learning algorithms
- control strategy
- fitted q iteration
- model based reinforcement learning
- long run
- control strategies
- initial state
- controller design
- markov decision problems
- control algorithm
- autonomous agents
- exploration exploitation
- particle swarm optimization
- policy iteration
- learning process
- multi agent
- temporal difference