Mixing Habits and Planning for Multi-Step Target Reaching Using Arbitrated Predictive Actor-Critic.
Farzaneh Sheikhnezhad FardThomas P. TrappenbergPublished in: IJCNN (2018)
Keyphrases
- multi step
- actor critic
- reinforcement learning
- temporal difference
- policy gradient
- approximate dynamic programming
- optimal control
- gradient method
- knn
- k nearest neighbor
- function approximation
- machine learning
- feature selection
- particle swarm optimization
- partially observable markov decision processes
- policy iteration