Kernel Metric Learning for In-Sample Off-Policy Evaluation of Deterministic RL Policies.
Haanvid LeeTri Wahyu GuntaraJongmin LeeYung-Kyun NohKee-Eung KimPublished in: ICLR (2024)
Keyphrases
- metric learning
- policy evaluation
- kernel matrix
- optimal policy
- reinforcement learning
- policy iteration
- least squares
- partially observable markov decision processes
- temporal difference
- markov decision processes
- model free
- monte carlo
- feature space
- function approximation
- learning tasks
- distance metric
- semi supervised
- markov decision problems
- variance reduction
- pairwise
- distance function
- multi task
- markov decision process
- dimensionality reduction
- sample size
- semi parametric
- dynamic programming
- semi supervised learning
- reward function
- unsupervised learning
- kernel methods
- state space
- gaussian process
- finite state
- infinite horizon
- kernel function
- partially observable
- feature selection
- reinforcement learning algorithms
- prior knowledge
- active learning
- evaluation function
- machine learning
- linear programming
- supervised learning