Login / Signup
VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment.
Shraman Pramanick
Li Jing
Sayan Nag
Jiachen Zhu
Hardik Shah
Yann LeCun
Rama Chellappa
Published in:
CoRR (2022)
Keyphrases
</>
weakly supervised
relation extraction
topic models
superpixels
object class
computer vision
named entities
natural language
image features
semi supervised
domain specific
feature vectors
image processing
feature set
multiscale
automatic extraction
learning algorithm