MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering.
Junjie WangYatai JiJiaqi SunYujiu YangTetsuya SakaiPublished in: EMNLP (Findings) (2021)
Keyphrases
- question answering
- multimodal interaction
- natural language processing
- information extraction
- question classification
- information retrieval
- natural language
- named entities
- cross language
- relation extraction
- expert systems
- low level
- document retrieval
- passage retrieval
- question answering systems
- natural language questions