MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering.
Aisha Urooj KhanAmir MazaheriNiels da Vitoria LoboMubarak ShahPublished in: EMNLP (Findings) (2020)
Keyphrases
- question answering
- multimodal fusion
- information retrieval
- natural language processing
- passage retrieval
- qa clef
- cross language
- information extraction
- high robustness
- question classification
- syntactic information
- natural language questions
- visual information
- question answering systems
- natural language
- low level
- visual features
- multimedia
- answer extraction
- multimedia databases
- gait recognition
- audio visual
- visual data
- document retrieval
- text mining
- hidden markov models
- expert systems
- metadata