Bandits with concave rewards and convex knapsacks.

Shipra Agrawal Nikhil R. Devanur

Published in: EC (2014)

Keyphrases

multi armed bandits
piecewise linear
convexity properties
convex functions
convex concave
reinforcement learning
bandit problems
knapsack problem
markov decision processes
objective function
convex optimization
stochastic systems
multi armed bandit
saddle point
convex hull
multiarmed bandit
risk minimization
globally optimal
data sets
long term and short term
semidefinite
upper bound
dynamic programming
neural network