Temporal Pyramid Transformer with Multimodal Interaction for Video Question Answering.
Min PengChongyang WangYuan GaoYu ShiXiang-Dong ZhouPublished in: CoRR (2021)
Keyphrases
- question answering
- multimodal interaction
- question classification
- video sequences
- information extraction
- information retrieval
- natural language
- video data
- multimedia
- video content
- question answering systems
- named entities
- natural language processing
- natural language questions
- qa clef
- text to speech
- cross language
- video retrieval
- answering questions
- answer validation
- sentence retrieval
- open domain question answering
- passage retrieval
- syntactic information
- video frames
- candidate answers
- semantic roles
- computer vision