FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding Tasks.
Santiago CastroFabian Caba HeilbronPublished in: CoRR (2022)
Keyphrases
- image data
- image content
- input image
- single image
- bayesian framework
- multiscale
- random fields
- image representation
- image analysis
- high resolution
- image features
- image classification
- key frames
- image frames
- probabilistic model
- image retrieval
- textual descriptions
- video images
- image segmentation
- test images
- video content
- image regions
- video analysis
- web images
- static images
- feature points
- text information
- video streams
- visual data
- segmentation method
- video search
- computer vision
- semantic labels