Regret Balancing for Bandit and RL Model Selection.
Yasin Abbasi-YadkoriAldo PacchianoMy PhanPublished in: CoRR (2020)
Keyphrases
- model selection
- bandit problems
- reinforcement learning
- regret bounds
- multi armed bandit
- upper confidence bound
- multi armed bandit problems
- cross validation
- parameter estimation
- lower bound
- mixture model
- hyperparameters
- machine learning
- online learning
- decision problems
- sample size
- bayesian learning
- statistical inference
- contextual bandit
- feature selection
- motion segmentation
- variable selection
- optimal policy
- selection criterion
- gaussian process
- random sampling
- statistical learning
- generalization error
- regression model
- markov decision processes
- reward function
- model selection criteria
- error estimation
- learning algorithm
- automatic model selection
- state space
- marginal likelihood
- information criterion
- leave one out cross validation
- maximum likelihood
- loss function
- generalization bounds
- parameter determination
- density estimation
- active learning
- decision trees
- bayesian methods