Reward Imputation with Sketching for Contextual Batched Bandits.
Xiao ZhangNinglu ShaoZihua SiJun XuWenhan WangHanjing SuJi-Rong WenPublished in: NeurIPS (2023)
Keyphrases
- multi armed bandit
- reinforcement learning
- missing values
- contextual information
- missing data
- multi armed bandits
- stochastic systems
- context sensitive
- data imputation
- long run
- neural network
- missing data imputation
- contextual knowledge
- data sets
- decision trees
- policy gradient
- search engine
- multiple imputation
- databases