Multi-level, multi-modal interactions for visual question answering over text in images.
Jincai ChenSheng ZhangJiangfeng ZengFuhao ZouYuan-Fang LiTao LiuPing LuPublished in: World Wide Web (2022)
Keyphrases
- multi modal
- question answering
- multiple modalities
- image annotation
- web images
- auto annotation
- cross modal
- information retrieval
- video search
- syntactic information
- single modality
- natural language processing
- audio visual
- image features
- image retrieval
- information extraction
- question classification
- visual features
- named entities
- image collections
- visual information
- passage retrieval
- natural language
- semantic concepts
- image search
- high dimensional
- image classification
- image registration
- multimedia
- visual data
- qa clef
- feature extraction
- cross language
- question answering systems
- metadata