Is a Video worth n n Images? A Highly Efficient Approach to Transformer-based Video Question Answering.
Chenyang LyuTianbo JiYvette GrahamJennifer FosterPublished in: SustaiNLP (2023)
Keyphrases
- question answering
- highly efficient
- video sequences
- visual data
- video data
- multimedia
- video frames
- real time
- qa clef
- question classification
- video content
- image classification
- named entities
- information retrieval
- natural language processing
- image retrieval
- low cost
- image annotation
- quantitative evaluation
- video search
- expert systems
- natural language questions