OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via Distribution Matching.

Hana Hoshino Kei Ota Asako Kanezaki Rio Yokota

Published in: CoRR (2021)

Keyphrases

inverse reinforcement learning
state space
optimal solution
bayesian nonparametric
partially observable environments