Multi-CLIP: Contrastive Vision-Language Pre-training for Question Answering tasks in 3D Scenes.
Alexandros DelitzasMaria ParelliNikolas HarsGeorgios VlassisSotirios-Konstantinos AnagnostidisGregor BachmannThomas HofmannPublished in: BMVC (2023)
Keyphrases
- question answering
- d scene
- natural language
- question answering systems
- question classification
- information extraction
- depth map
- information retrieval
- computer vision
- cross language
- single image
- natural language processing
- natural language questions
- answering questions
- passage retrieval
- syntactic information
- candidate answers
- image processing
- qa clef
- answer validation
- high resolution
- training set
- video sequences
- target language
- three dimensional
- qa systems