Leveraging per Image-Token Consistency for Vision-Language Pre-training.
Yunhao GouTom KoHansi YangJames T. KwokYu ZhangMingxuan WangPublished in: CVPR (2023)
Keyphrases
- input image
- multiscale
- image data
- image features
- image classification
- template matching
- low level
- visual perception
- image retrieval
- image analysis
- region of interest
- test images
- single image
- pixel values
- programming language
- vision system
- image content
- image set
- image collections
- low level image processing
- image matching
- keypoints
- image representation
- feature points
- edge detection
- training set
- computer vision
- supervised learning
- natural language
- similarity measure
- feature extraction
- image pixels
- human vision
- low level vision
- real time