CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification.
Chun-Fu (Richard) ChenQuanfu FanRameswar PandaPublished in: ICCV (2021)
Keyphrases
- image classification
- multiscale
- image representation
- image features
- visual features
- bag of words
- scale space
- vision system
- fuzzy logic
- image processing
- computer vision
- feature extraction
- coarse to fine
- visual attention
- focus of attention
- fault diagnosis
- edge detection
- human vision
- image analysis
- power system
- real time
- multiple scales
- sparse representation
- object recognition
- multi label
- class specific
- visual field
- deep structure
- remotely sensed data
- visual words
- keypoints
- natural images
- wavelet transform
- decision making
- information retrieval
- data sets