Login / Signup
Near-Optimal Pure Exploration in Matrix Games: A Generalization of Stochastic Bandits & Dueling Bandits.
Arnab Maiti
Ross Boczar
Kevin G. Jamieson
Lillian J. Ratliff
Published in:
CoRR (2023)
Keyphrases
</>
stochastic systems
multi armed bandits
regret bounds
stochastic models
multi armed bandit
stochastic optimization
sample path
lower bound
confidence intervals
data sets
multi agent
cooperative
online learning
singular values
hopfield neural network
bandit problems