Bandits with concave rewards and convex knapsacks.
Shipra AgrawalNikhil R. DevanurPublished in: EC (2014)
Keyphrases
- multi armed bandits
- piecewise linear
- convexity properties
- convex functions
- convex concave
- reinforcement learning
- bandit problems
- knapsack problem
- markov decision processes
- objective function
- convex optimization
- stochastic systems
- multi armed bandit
- saddle point
- convex hull
- multiarmed bandit
- risk minimization
- globally optimal
- data sets
- long term and short term
- semidefinite
- upper bound
- dynamic programming
- neural network