ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions.
Chunlong XiaXinliang WangFeng LvXin HaoYifeng ShiPublished in: CoRR (2024)
Keyphrases
- multiscale
- image processing
- human computer interaction
- vision system
- fuzzy logic
- real time
- multiple scales
- visual perception
- natural images
- image features
- scale space
- densely sampled
- human interaction
- image representation
- wavelet transform
- feature vectors
- image segmentation
- computer vision
- edge detection
- multiresolution
- fault diagnosis
- sparse coding
- coarse to fine
- clustering algorithm
- human robot interaction
- human vision
- artificial intelligence
- stereo correspondence
- deep learning
- data sets