Multi-Granularity Contrastive Cross-Modal Collaborative Generation for End-to-End Long-Term Video Question Answering.
Ting YuKunhao FuJian ZhangQingming HuangJun YuPublished in: IEEE Trans. Image Process. (2024)
Keyphrases
- end to end
- question answering
- cross modal
- multi modal
- visual data
- video data
- multi user
- video sequences
- information extraction
- natural language processing
- multimedia
- video streams
- multimedia databases
- natural language
- semantic concepts
- information retrieval
- video content
- video frames
- video analysis
- image retrieval
- multimedia data
- key frames
- databases
- image sequences
- image data
- query processing