Self-supervised video pretraining yields strong image representations.
Nikhil ParthasarathyS. M. Ali EslamiJoão CarreiraOlivier J. HénaffPublished in: CoRR (2022)
Keyphrases
- image representation
- multiscale
- image classification
- bag of words
- video data
- image content
- object recognition
- video sequences
- visual words
- receptive fields
- scene classification
- video content
- feature representations
- representation scheme
- sparse coding
- image features
- quadtree
- video frames
- region segmentation
- spatial pyramid
- low level features
- cbir systems
- video retrieval
- image retrieval
- bag of features
- video surveillance
- image processing
- scene recognition
- low level
- visual vocabulary
- image classification and retrieval
- visual recognition tasks