Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding.
Akira FukuiDong Huk ParkDaylen YangAnna RohrbachTrevor DarrellMarcus RohrbachPublished in: CoRR (2016)
Keyphrases
- question answering
- visual information
- information extraction
- named entities
- natural language questions
- information retrieval
- syntactic information
- visual features
- natural language processing
- low level
- multi modal
- automatically generated
- video search
- question answering systems
- natural language
- question classification
- qa clef
- artificial intelligence