Login / Signup
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision.
Wonjae Kim
Bokyung Son
Ildoo Kim
Published in:
CoRR (2021)
Keyphrases
</>
image processing
computer vision
programming language
input image
language learning
real time
active learning
vision system
language processing
moving objects
fuzzy logic
pattern languages
formal language
power transformers