A learning algorithm for the finite-time two-armed bandit problem.
Mitsuo SatoKenichi AbeHiroshi TakedaPublished in: IEEE Trans. Syst. Man Cybern. (1984)
Keyphrases
- learning algorithm
- training data
- finite number
- machine learning
- machine learning algorithms
- learning process
- random sampling
- supervised learning
- learning problems
- active learning
- learning rate
- unlabeled data
- generalization ability
- learning scheme
- rbf network
- bandit problems
- real numbers
- finite automata
- real time
- learning models
- learning tasks
- machine learning methods
- training samples
- markov chain
- state space
- data points
- optimal solution
- data mining