Multi-CLIP: Contrastive Vision-Language Pre-training for Question Answering tasks in 3D Scenes.

Published in: BMVC (2023)

Keyphrases