FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding Tasks.
Santiago CastroFabian CabaPublished in: BMVC (2022)
Keyphrases
- image features
- random fields
- multiscale
- input image
- image classification
- image frames
- image segmentation
- image retrieval
- single image
- image data
- bayesian framework
- test images
- image regions
- spatial information
- static images
- visual data
- image representation
- edge detection
- image analysis
- low level
- segmentation method
- semantic labels
- keywords
- image content
- video frames
- video search
- image collections
- high resolution
- caption text
- visual concepts
- web images
- video analysis
- video retrieval
- motion estimation
- video content
- key frames
- video streams
- information retrieval
- generative model
- video data