Login / Signup
VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment.
Shraman Pramanick
Li Jing
Sayan Nag
Jiachen Zhu
Hardik Shah
Yann LeCun
Rama Chellappa
Published in:
Trans. Mach. Learn. Res. (2023)
Keyphrases
</>
weakly supervised
relation extraction
computer vision
topic models
object class
superpixels
semi supervised
image features
natural language
object detection
question answering
named entities
higher order
object detectors
natural images
feature vectors
viewpoint
object recognition
image processing