Two-time-scale online actor-critic paradigm driven by POMDP.
Bo LiuHaibo HeDaniel W. ReppergerPublished in: ICNSC (2010)
Keyphrases
- actor critic
- reinforcement learning
- partially observable markov decision processes
- policy gradient
- dynamical systems
- gradient method
- state space
- average reward
- optimal control
- finite state
- belief state
- markov decision processes
- multi agent
- temporal difference
- reinforcement learning algorithms
- function approximation
- markov decision process
- neuro fuzzy
- dynamic programming