Video-Context Aligned Transformer for Video Question Answering.
Linlin ZongJiahui WanXianchao ZhangXinyue LiuWenxin LiangBo XuPublished in: AAAI (2024)
Keyphrases
- question answering
- video content
- video sequences
- video data
- multimedia
- information extraction
- video frames
- natural language
- named entities
- information retrieval
- relation extraction
- video retrieval
- natural language processing
- semantic roles
- cross language
- open domain question answering
- natural language questions
- qa clef
- syntactic information
- question answering systems
- context dependent
- multi modal