Rethinking Multi-Modal Alignment in Video Question Answering from Feature and Sample Perspectives.
Shaoning XiaoLong ChenKaifeng GaoZhao WangYi YangJun XiaoPublished in: CoRR (2022)
Keyphrases
- multi modal
- question answering
- video search
- semantic concepts
- information retrieval
- question answering systems
- information extraction
- question classification
- video content
- passage retrieval
- natural language questions
- syntactic information
- audio visual
- multiple modalities
- video sequences
- natural language processing
- video data
- natural language
- video streams
- cross language
- video frames
- qa clef
- multimedia
- image features
- feature vectors
- image annotation
- high dimensional
- video analysis
- visual features
- multimedia data
- machine learning