Stochastic matrix games with bandit feedback.

Brendan O'Donoghue Tor Lattimore Ian Osband

Published in: CoRR (2020)

Keyphrases

computer games
video games
stochastic optimization
user feedback
monte carlo
hopfield neural network
singular value decomposition
nash equilibria
multi armed bandit
feedback mechanisms
game theoretic
game theory
game playing
game design
random sampling
human computation
covariance matrix
stochastic model
nash equilibrium
visual feedback
weighted majority
reinforcement learning