Efficient Inference Without Trading-off Regret in Bandits: An Allocation Probability Test for Thompson Sampling.
Nina DeliuJoseph Jay WilliamsSofia S. VillarPublished in: CoRR (2021)
Keyphrases
- multi armed bandit
- efficient inference
- fully connected
- probabilistic inference
- reinforcement learning
- conditional random fields
- regret bounds
- hidden variables
- markov random field
- human pose estimation
- approximate inference
- structured prediction
- probability distribution
- random sampling
- missing values
- linear regression
- higher order
- graph structure
- exact inference
- worst case
- factor graphs
- learning algorithm