A Reinforcement Learning Method for Maximizing Undiscounted Rewards.
Anton SchwartzPublished in: ICML (1993)
Keyphrases
- reinforcement learning
- dynamic programming
- markov decision processes
- experimental evaluation
- synthetic data
- support vector machine svm
- pairwise
- objective function
- high accuracy
- data sets
- significant improvement
- clustering method
- cost function
- prior knowledge
- preprocessing
- computational complexity
- similarity measure
- feature extraction