Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training.
Dezhao LuoJiabo HuangShaogang GongHailin JinYang LiuPublished in: CoRR (2023)
Keyphrases
- image retrieval
- web images
- visual data
- information retrieval
- semantic content
- video search
- image data
- semantic labels
- image content
- visual appearance
- news video
- visual cues
- textual descriptions
- visual features
- medical image retrieval
- visual concepts
- multimedia documents
- low level
- visual descriptors
- text retrieval
- image features
- text queries
- pre trained
- visual similarity
- input image
- textual and visual information
- textual query
- multimedia search
- image description
- scanned documents
- visually similar
- visual input
- image collections
- image representation
- image classification
- image database
- multiscale
- visual and textual features
- image search
- visual information
- image regions
- video data
- video collections
- auto annotation
- video database
- visual content
- content based retrieval
- key frames
- relevance feedback
- high resolution
- multimedia
- document analysis
- video shots
- semantic concepts
- human visual system
- video streams
- information retrieval systems