ViLTA: Enhancing Vision-Language Pre-training through Textual Augmentation.
Weihan WangZhen YangBin XuJuanzi LiYankui SunPublished in: ICCV (2023)
Keyphrases
- natural language
- computer vision
- language learning
- training examples
- training set
- training phase
- language processing
- supervised learning
- decision trees
- programming language
- online learning
- real time
- metadata
- data sets
- formal language
- computational linguistics
- human generated
- training algorithm
- free text
- vision system
- multimedia
- image processing