Multi-modal spatial relational attention networks for visual question answering.
Haibo YaoLipeng WangChengtao CaiYuxin SunZhi ZhangYongkang LuoPublished in: Image Vis. Comput. (2023)
Keyphrases
- multi modal
- question answering
- cross modal
- audio visual
- passage retrieval
- information retrieval
- information extraction
- video search
- question classification
- natural language processing
- syntactic information
- natural language questions
- single modality
- visual features
- multi modality
- data model
- relational databases
- question answering systems
- cross language
- multiple modalities
- image annotation
- low level
- qa systems
- candidate answers
- qa clef
- natural language
- high level