Cross-Modal Multistep Fusion Network With Co-Attention for Visual Question Answering.
Mingrui LaoYanming GuoHui WangXin ZhangPublished in: IEEE Access (2018)
Keyphrases
- question answering
- cross modal
- multi modal
- natural language
- natural language processing
- visual data
- information retrieval
- visual recognition
- image retrieval
- multimedia retrieval
- multimedia databases
- information extraction
- named entities
- visual similarity
- high level
- visual information
- digital libraries
- visual features
- image classification
- image data