Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models.
Wenhao WuXiaohan WangHaipeng LuoJingdong WangYi YangWanli OuyangPublished in: CVPR (2023)
Keyphrases
- language model
- pre trained
- cross modal
- visual recognition
- document retrieval
- information retrieval
- probabilistic model
- multi modal
- n gram
- retrieval model
- speech recognition
- multimedia
- object recognition
- query expansion
- video sequences
- computer vision
- visual data
- training examples
- video data
- test collection
- multimedia data
- relevance model
- video frames
- high dimensional
- video content
- video retrieval
- feature extraction