Local Metric Learning for Off-Policy Evaluation in Contextual Bandits with Continuous Actions.
Haanvid LeeJongmin LeeYunseon ChoiWonseok JeonByung-Jun LeeYung-Kyun NohKee-Eung KimPublished in: NeurIPS (2022)
Keyphrases
- metric learning
- policy evaluation
- action space
- least squares
- markov decision processes
- temporal difference
- distance metric
- semi supervised
- reinforcement learning
- learning tasks
- monte carlo
- policy iteration
- dimensionality reduction
- action selection
- pairwise
- multi task
- function approximation
- model free
- distance function
- feature space
- kernel matrix
- state space
- semi supervised learning
- markov random field
- optimal policy
- semi parametric
- variance reduction
- gaussian process
- partially observable
- objective function