Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training.
Dezhao LuoJiabo HuangShaogang GongHailin JinYang LiuPublished in: CVPR (2023)
Keyphrases
- image retrieval
- web images
- visual concepts
- medical image retrieval
- video search
- visual cues
- news video
- visual features
- input image
- semantic content
- image data
- visual appearance
- low level
- text retrieval
- textual descriptions
- visual data
- visual similarity
- image description
- textual query
- multiscale
- image content
- image collections
- image classification
- visual descriptors
- content based video retrieval
- image database
- information retrieval
- visual information
- semantic labels
- visually similar
- textual and visual information
- multimedia data
- video streams
- video sequences
- multimedia search
- text queries
- image representation
- multimedia documents
- video retrieval
- video content
- pre trained
- high resolution
- visual input
- multimedia
- video collections
- text detection
- key frames
- image search
- video database
- video shots
- document analysis