Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models.
Wenhao WuXiaohan WangHaipeng LuoJingdong WangYi YangWanli OuyangPublished in: CoRR (2023)
Keyphrases
- language model
- pre trained
- cross modal
- probabilistic model
- multi modal
- visual recognition
- document retrieval
- speech recognition
- training data
- object recognition
- retrieval model
- n gram
- information retrieval
- query expansion
- multimedia
- computer vision
- test collection
- training examples
- video data
- video content
- video retrieval
- video frames
- multimedia databases
- visual data
- feature extraction