Cross-Modal Attention Consistency for Video-Audio Unsupervised Learning.
Shaobo MinQi DaiHongtao XieChuang GanYongdong ZhangJingdong WangPublished in: CoRR (2021)
Keyphrases
- cross modal
- unsupervised learning
- visual data
- multi modal
- video data
- video sequences
- multimedia retrieval
- multiple modalities
- semantic concepts
- multimedia databases
- supervised learning
- multimedia
- multimedia data
- video frames
- dimensionality reduction
- video analysis
- visual recognition
- image retrieval
- visual information
- semi supervised
- video content
- object recognition
- image data
- space time
- video streams
- video clips
- visual features
- visual similarity
- text classification
- active learning
- relevance feedback
- image features