Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners.
Zhenhailong WangManling LiRuochen XuLuowei ZhouJie LeiXudong LinShuohang WangZiyi YangChenguang ZhuDerek HoiemShih-Fu ChangMohit BansalHeng JiPublished in: NeurIPS (2022)
Keyphrases
- language model
- image descriptors
- language learners
- video content
- video data
- video sequences
- key frames
- speech recognition
- probabilistic model
- n gram
- document retrieval
- retrieval model
- test collection
- information retrieval
- multimedia
- language learning
- query expansion
- similarity measure
- video frames
- query terms
- image structure
- pose estimation
- local binary pattern
- visual features
- distance function
- video retrieval
- retrieval process
- retrieval effectiveness
- image sequences
- image analysis
- high level
- active learning