Delayed reward-based genetic algorithms for partially observable Markov decision problems.
Yoshihide YamashiroAtsushi UenoHideaki TakedaPublished in: Systems and Computers in Japan (2004)
Keyphrases
- partially observable
- markov decision problems
- reinforcement learning
- reward function
- genetic algorithm
- partially observable environments
- state space
- markov decision processes
- decision problems
- dynamical systems
- average reward
- infinite horizon
- fully observable
- belief state
- optimal policy
- reinforcement learning algorithms
- linear programming
- evolutionary algorithm
- machine learning
- dynamic programming
- multiple agents
- function approximation
- temporal difference
- long run
- transition probabilities
- multi agent
- markov model
- state variables
- learning algorithm
- heuristic search
- supervised learning