Local Metric Learning for Off-Policy Evaluation in Contextual Bandits with Continuous Actions.
Haanvid LeeJongmin LeeYunseon ChoiWonseok JeonByung-Jun LeeYung-Kyun NohKee-Eung KimPublished in: CoRR (2022)
Keyphrases
- metric learning
- policy evaluation
- action space
- markov decision processes
- least squares
- distance metric
- temporal difference
- learning tasks
- pairwise
- multi task
- semi supervised
- reinforcement learning
- dimensionality reduction
- function approximation
- kernel matrix
- action selection
- monte carlo
- model free
- distance function
- policy iteration
- state space
- semi supervised learning
- variance reduction
- feature space
- prior knowledge
- labeled data
- unsupervised learning
- supervised learning
- multi class
- data points
- dynamic programming
- partially observable markov decision processes
- high dimensional