Video Question Answering Using CLIP-Guided Visual-Text Attention.
Shuhong YeWeikai KongChenglin YaoJianfeng RenXudong JiangPublished in: CoRR (2023)
Keyphrases
- question answering
- video clips
- information retrieval
- syntactic information
- video search
- news video
- text summarization
- information extraction
- textual entailment recognition
- natural language processing
- key frames
- visual data
- video content
- visual information
- cross language
- free text
- natural language
- video frames
- video data
- text documents
- visual features
- multimedia
- question classification
- named entities
- passage retrieval
- video retrieval
- video sequences
- question answering systems
- text mining
- text retrieval
- video streams
- qa clef
- multi modal
- question answer pairs
- low level
- semantic information
- qa systems
- natural language questions
- answer validation
- keywords
- semantic roles
- text classification