Maximum reward reinforcement learning: A non-cumulative reward criterion.
Kian Hong QuahHiok Chai QuekPublished in: Expert Syst. Appl. (2006)
Keyphrases
- reinforcement learning
- partially observable environments
- reward function
- average reward
- eligibility traces
- state space
- function approximation
- optimal policy
- markov decision processes
- total reward
- multi agent
- learning algorithm
- policy gradient
- supervised learning
- optimality criterion
- partially observable
- reinforcement learning algorithms
- transfer learning
- markov decision process
- function approximators
- optimization criterion
- learning capabilities
- model free
- learning problems