GRIT: Faster and Better Image Captioning Transformer Using Dual Visual Features.
Van-Quang NguyenMasanori SuganumaTakayuki OkataniPublished in: ECCV (36) (2022)
Keyphrases
- visual features
- image classification
- image collections
- image retrieval
- visual appearance
- image categorization
- web images
- visual information
- image search
- low level
- visually similar
- global features
- low level visual features
- visual content
- labeled images
- image content
- visual data
- sample images
- bag of features
- semantic gap
- visual patterns
- visual descriptors
- visual properties
- image representation
- image features
- visual similarity
- image data
- low level features
- image annotation
- visual attributes
- image similarity
- semantic concepts
- input image
- multiscale
- test images
- sift features
- text queries
- automatic image annotation
- bag of words
- keywords
- feature extraction
- video shots
- semantically meaningful
- key frames
- spatial information
- image regions
- keypoints
- natural images
- image database
- relevance feedback
- high level
- image segmentation