Look at What I'm Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos.

Published in: NeurIPS (2021)

Keyphrases