Efficient End-to-End Video Question Answering with Pyramidal Multimodal Transformer.
Min PengChongyang WangYu ShiXiang-Dong ZhouPublished in: CoRR (2023)
Keyphrases
- end to end
- question answering
- scalable video
- natural language
- information retrieval
- named entities
- question classification
- information extraction
- syntactic information
- sentence retrieval
- natural language processing
- qa clef
- video data
- cross language
- congestion control
- video sequences
- machine learning
- passage retrieval
- question answering systems
- natural language questions
- video content
- multi modal
- artificial intelligence
- video streams
- answer validation
- textual entailment recognition
- open domain question answering