Reinforcement Learning in Education: A Multi-Armed Bandit Approach.
Herkulaas CombrinkVukosi MarivateBenjamin RosmanPublished in: CoRR (2022)
Keyphrases
- multi armed bandit
- reinforcement learning
- multi armed bandits
- state space
- decentralized decision making
- optimal policy
- model free
- machine learning
- multi agent
- temporal difference
- learning algorithm
- learning process
- markov decision processes
- e learning
- lower bound
- pairwise
- probabilistic model
- maximum likelihood
- hidden variables