Policies for Contextual Bandit Problems with Count Payoffs.
Thibault GisselbrechtSylvain LamprierPatrick GallinariPublished in: ICTAI (2015)
Keyphrases
- bandit problems
- multi armed bandit problems
- decision problems
- optimal policy
- multi armed bandits
- expected utility
- contextual information
- exploration exploitation
- incomplete information
- decentralized decision making
- context sensitive
- markov decision problems
- perfect information
- partially observable markov decision processes
- optimal solution
- markov decision processes
- utility function
- search space
- search algorithm