MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering.
Aisha Urooj KhanAmir MazaheriNiels da Vitoria LoboMubarak ShahPublished in: CoRR (2020)
Keyphrases
- question answering
- multimodal fusion
- information retrieval
- question classification
- natural language processing
- visual information
- information extraction
- qa clef
- natural language
- cross language
- passage retrieval
- high robustness
- natural language questions
- audio visual
- visual features
- relevance feedback
- syntactic information
- low level
- candidate answers
- answering questions
- question answering systems
- expert systems
- answer validation
- artificial intelligence
- multimodal interfaces
- qa systems
- visual data
- machine learning