Login / Signup
Transformer vision-language tracking via proxy token guided cross-modal fusion.
Haojie Zhao
Xiao Wang
Dong Wang
Huchuan Lu
Xiang Ruan
Published in:
Pattern Recognit. Lett. (2023)
Keyphrases
</>
cross modal
multi modal
computer vision
image retrieval
multimedia retrieval
natural language
data fusion
visual data
visual similarity
multimedia databases
visual recognition
perceptual information
object recognition