VEGA: Learning Interleaved Image-Text Comprehension in Vision-Language Large Models.
Chenyu ZhouMengdan ZhangPeixian ChenChaoyou FuYunhang ShenXiawu ZhengXing SunRongrong JiPublished in: CoRR (2024)
Keyphrases
- prior knowledge
- learning algorithm
- learning models
- image retrieval
- image data
- image analysis
- image content
- bayesian framework
- input image
- image features
- high resolution
- learning process
- computer vision
- image segmentation
- image classification
- single image
- learned models
- accurate models
- visual perception
- information retrieval
- language learning
- cognitive models
- learning tasks
- image representation
- supervised learning
- probabilistic model
- image processing
- document images
- image regions
- image collections
- natural language