HERO: HiErarchical spatio-tempoRal reasOning with Contrastive Action Correspondence for End-to-End Video Object Grounding.
Mengze LiTianbao WangHaoyu ZhangShengyu ZhangZhou ZhaoWenqiao ZhangJiaxu MiaoShiliang PuFei WuPublished in: ACM Multimedia (2022)