Multi-CLIP: Contrastive Vision-Language Pre-training for Question Answering tasks in 3D Scenes.
Alexandros DelitzasMaria ParelliNikolas HarsGeorgios VlassisSotiris AnagnostidisGregor BachmannThomas HofmannPublished in: CoRR (2023)
Keyphrases
- question answering
- d scene
- natural language
- question answering systems
- single image
- question classification
- natural language processing
- information extraction
- information retrieval
- depth map
- syntactic information
- computer vision
- cross language
- passage retrieval
- answering questions
- answer validation
- natural language questions
- qa clef
- optical flow
- semantic roles
- qa systems
- training set
- image processing
- test set
- multimedia
- document retrieval
- learning algorithm
- answer extraction
- machine learning
- image sequences