ViLTA: Enhancing Vision-Language Pre-training through Textual Augmentation.

Weihan Wang Zhen Yang Bin Xu Juanzi Li Yankui Sun

Published in: CoRR (2023)

Keyphrases

natural language
real time
programming language
computer vision
image processing
language learning
specification language
operational semantics
neural network
information retrieval
multimedia
training process
textual information
conceptual graphs
language processing
visual field