Cross-Modal Reasoning with Event Correlation for Video Question Answering.
Chengxiang YinZhengping CheKun WuZhiyuan XuQinru QiuJian TangPublished in: CoRR (2023)
Keyphrases
- question answering
- cross modal
- multi modal
- visual data
- answering questions
- information retrieval
- video data
- natural language
- natural language processing
- video sequences
- video content
- information extraction
- multimedia
- video frames
- semantic concepts
- image retrieval
- named entities
- video streams
- multimedia data
- multimedia databases
- knowledge representation
- knowledge base
- video retrieval
- human actions
- visual information
- action recognition
- information retrieval systems
- knn