POTEC: Off-Policy Learning for Large Action Spaces via Two-Stage Policy Decomposition.

Yuta Saito Jihan Yao Thorsten Joachims

Published in: CoRR (2024)

Keyphrases

learning algorithm
action space
reinforcement learning
supervised learning
prior knowledge
state action
state and action spaces
action selection
pairwise
probability distribution
state space
domain independent
partially observable
continuous state
reinforcement learning problems
skill learning