Frame-Subtitle Self-Supervision for Multi-Modal Video Question Answering.
Jiong WangZhou ZhaoWeike JinPublished in: CoRR (2022)
Keyphrases
- multi modal
- question answering
- video frames
- video search
- semantic concepts
- key frames
- natural language processing
- video data
- information extraction
- natural language
- video content
- information retrieval
- video sequences
- question classification
- qa clef
- cross language
- natural language questions
- audio visual
- syntactic information
- passage retrieval
- video shots
- semantic roles
- question answering systems
- multiple modalities
- video analysis
- video streams
- multimedia
- video retrieval
- image annotation
- qa systems
- document retrieval
- answer extraction
- answering questions
- metadata