Publication: Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning.