AntNet with Reward-Penalty Reinforcement Learning.
Pooia LalbakhshBahram ZaeriAli LalbakhshMehdi N. FesharakiPublished in: CICSyN (2010)
Keyphrases
- reinforcement learning
- function approximation
- reward function
- routing algorithm
- state space
- average reward
- markov decision processes
- reinforcement learning algorithms
- model free
- eligibility traces
- optimal policy
- partially observable environments
- dynamic programming
- shortest path algorithm
- temporal difference
- penalty function
- machine learning
- multi agent
- learning problems
- reinforcement learning methods
- policy gradient
- supervised learning
- optimal control
- learning capabilities
- real robot
- transfer learning
- state action
- markov decision problems
- multi agent reinforcement learning
- policy search
- learning algorithm