Approximate Indexability and Bandit Problems with Concave Rewards and Delayed Feedback.

Sudipto Guha Kamesh Munagala

Published in: APPROX-RANDOM (2013)

Keyphrases

bandit problems
delayed feedback
multi armed bandits
decision problems
piecewise linear
exploration exploitation
objective function
decentralized decision making
learning algorithm
lower bound
genetic algorithm