ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions.

Chunlong Xia Xinliang Wang Feng Lv Xin Hao Yifeng Shi

Published in: CoRR (2024)

Keyphrases

multiscale
image processing
human computer interaction
vision system
fuzzy logic
real time
multiple scales
visual perception
natural images
image features
scale space
densely sampled
human interaction
image representation
wavelet transform
feature vectors
image segmentation
computer vision
edge detection
multiresolution
fault diagnosis
sparse coding
coarse to fine
clustering algorithm
human robot interaction
human vision
artificial intelligence
stereo correspondence
deep learning
data sets