Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners.
Zhenhailong WangManling LiRuochen XuLuowei ZhouJie LeiXudong LinShuohang WangZiyi YangChenguang ZhuDerek HoiemShih-Fu ChangMohit BansalHeng JiPublished in: CoRR (2022)
Keyphrases
- language model
- image descriptors
- language learners
- video data
- video content
- video sequences
- key frames
- speech recognition
- n gram
- probabilistic model
- similarity measure
- retrieval model
- language learning
- document retrieval
- information retrieval
- video frames
- video retrieval
- test collection
- pose estimation
- query expansion
- multimedia
- local binary pattern
- distance function
- query terms
- visual features
- image structure
- machine learning
- image analysis
- search engine
- foreign language