Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding.

Published in: ECCV (32) (2022)

Keyphrases