GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features.
Van-Quang NguyenMasanori SuganumaTakayuki OkataniPublished in: CoRR (2022)
Keyphrases
- visual features
- image classification
- image collections
- image retrieval
- image categorization
- visual appearance
- global features
- web images
- semantic gap
- low level
- visually similar
- image search
- visual content
- visual descriptors
- visual information
- low level visual features
- image content
- image features
- image representation
- visual similarity
- labeled images
- bag of features
- visual data
- image annotation
- image similarity
- sample images
- multiscale
- input image
- visual properties
- image data
- visual patterns
- sift features
- image matching
- low level features
- visual attributes
- key frames
- information retrieval
- semantic concepts
- automatic image annotation
- visual words
- spatial information
- image regions
- spatial relationships
- image database
- relevance feedback
- feature space
- keywords