Efficient End-to-End Video Question Answering with Pyramidal Multimodal Transformer.
Min PengChongyang WangYu ShiXiang-Dong ZhouPublished in: AAAI (2023)
Keyphrases
- end to end
- question answering
- scalable video
- question classification
- information extraction
- information retrieval
- natural language
- natural language processing
- multimedia
- cross language
- named entities
- syntactic information
- congestion control
- question answering systems
- natural language questions
- video frames
- passage retrieval
- answer validation
- video sequences
- multi modal
- qa clef
- answering questions
- video content
- video streams
- video data
- search engine
- semantic roles
- video retrieval
- machine learning
- candidate answers
- sentence retrieval
- wordnet
- knowledge representation
- open domain question answering