Visual Transformers: Token-based Image Representation and Processing for Computer Vision.
Bichen WuChenfeng XuXiaoliang DaiAlvin WanPeizhao ZhangMasayoshi TomizukaKurt KeutzerPeter VajdaPublished in: CoRR (2020)
Keyphrases
- image representation
- computer vision
- object recognition
- image features
- image classification
- scene categorization
- multiscale
- visual information processing
- image content
- bag of words
- image retrieval
- visual words
- visual features
- quadtree
- receptive fields
- visual information
- pattern recognition
- representation scheme
- sparse coding
- bag of features
- object detection
- visual vocabulary
- feature space
- scene recognition
- bag of visual words
- feature representations
- low level
- scene categories
- image classification and retrieval
- low level features
- image processing
- object categories
- image segmentation
- signal processing
- natural images
- pose estimation
- visual content
- sparse representation
- web images
- scene understanding
- cbir systems
- vision system
- action recognition
- spatial layout
- image database
- contourlet transform
- query processing
- face recognition
- high level