CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data.
Sachin MehtaMaxwell HortonFartash FaghriMohammad Hossein SekhavatMahyar NajibiMehrdad FarajtabarOncel TuzelMohammad RastegariPublished in: CoRR (2024)
Keyphrases
- recognition accuracy
- web scale
- web images
- text data
- million images
- image data
- image search
- image retrieval
- image content
- image features
- recognition rate
- face recognition
- low level
- image classification
- multiscale
- image representation
- image segmentation
- visual features
- text classification
- image annotation
- input image
- image collections
- image regions
- databases
- feature extraction
- web pages
- high level
- information retrieval
- image set
- training set