Enhancing Visual Grounding in Vision-Language Pre-Training With Position-Guided Text Prompts.
Alex Jinpeng WangPan ZhouMike Zheng ShouShuicheng YanPublished in: IEEE Trans. Pattern Anal. Mach. Intell. (2024)
Keyphrases
- language generation
- visual perception
- english text
- text to speech synthesis
- computational linguistics
- human vision
- visual processing
- english language
- human language
- native language
- visual input
- visual information
- database
- language processing
- visual field
- vision system
- real time
- language learning
- text retrieval
- text to speech
- programming language
- visual query language
- web images
- word meanings
- low level
- computer vision
- natural language
- linguistic analysis
- information retrieval
- semantic content
- training set
- text classification
- training samples
- training process
- visual scene
- visual search
- semantic representations
- machine translation system
- text generation
- visual features
- supervised learning
- position and orientation
- training corpus
- neural network
- training examples
- language specific
- image processing
- syntactic categories
- text mining
- video search