Large-scale Vision-Language Models Learn Super Images for Efficient and High-Performance Partially Relevant Video Retrieval.
Taichi NishimuraShota NakadaMasayoshi KondoPublished in: CoRR (2023)
Keyphrases
- video retrieval
- language model
- semantic gap
- video collections
- content based retrieval
- language modeling
- image database
- n gram
- video database
- three dimensional
- image features
- image data
- document retrieval
- retrieval model
- image retrieval
- query expansion
- speech recognition
- image regions
- image understanding
- image collections
- visual content
- video data
- concept detection
- object recognition
- probabilistic model
- video clips
- video content
- image classification
- image content
- relevance model
- information retrieval