A reinforcement learning method using a dynamic reinforcement function based on action selection probability.
Yugo HasegawaSatoko TakadaHidehiro NakanoShuichi AraiArata MiyauchiPublished in: Systems and Computers in Japan (2007)
Keyphrases
- reinforcement learning
- action selection
- significant improvement
- detection method
- preprocessing
- segmentation method
- dynamic programming
- high accuracy
- policy search
- continuous state and action spaces
- function approximators
- cross entropy
- temporal difference
- model free
- function approximation
- clustering method
- step size
- cost function