Reinforcement Learning in Education: A Multi-Armed Bandit Approach.

Herkulaas Combrink Vukosi Marivate Benjamin Rosman

Published in: CoRR (2022)

Keyphrases

multi armed bandit
reinforcement learning
multi armed bandits
state space
decentralized decision making
optimal policy
model free
machine learning
multi agent
temporal difference
learning algorithm
learning process
markov decision processes
e learning
lower bound
pairwise
probabilistic model
maximum likelihood
hidden variables