CLIP-Guided Vision-Language Pre-training for Question Answering in 3D Scenes.
Maria ParelliAlexandros DelitzasNikolas HarsGeorgios VlassisSotiris AnagnostidisGregor BachmannThomas HofmannPublished in: CVPR Workshops (2023)
Keyphrases
- question answering
- d scene
- natural language
- question classification
- single image
- depth map
- information retrieval
- passage retrieval
- information extraction
- natural language questions
- qa clef
- natural language processing
- test set
- optical flow
- cross language
- computer vision
- training set
- syntactic information
- answer validation
- qa systems
- question answering systems
- candidate answers
- answering questions
- document retrieval
- three dimensional