Contextual Multi-armed Bandit Algorithm for Semiparametric Reward Model.
Gi-Soo KimMyunghee Cho PaikPublished in: CoRR (2019)
Keyphrases
- semi parametric
- objective function
- multi armed bandit
- probabilistic model
- detection algorithm
- parameter estimation
- learning algorithm
- linear model
- em algorithm
- prior information
- model free
- similarity measure
- input data
- expectation maximization
- closed form
- maximum likelihood
- mutual information
- distance function
- bayesian inference
- reinforcement learning