Vision-language pre-training via modal interaction.
Hang ChengHehui YeXiaofei ZhouXimeng LiuFei ChenMeiqing WangPublished in: Pattern Recognit. (2024)
Keyphrases
- human computer interaction
- language learning
- computer vision
- language processing
- vision system
- real time
- programming language
- natural language
- machine learning
- training process
- database
- visual perception
- operational semantics
- human communication
- knowledge base
- image processing
- social networks
- artificial intelligence
- training samples
- training examples
- data sets
- visual field