Learn to Exceed: Stereo Inverse Reinforcement Learning with Concurrent Policy Optimization.

Feng Tao Yongcan Cao

Published in: CoRR (2020)

Keyphrases

inverse reinforcement learning
partially observable environments
bayesian nonparametric
reward function
preference elicitation
state space
partially observable
mathematical programming
temporal difference
optimal solution
special case
hidden markov models
np hard
semi supervised
optimal policy
utility function