Probing Inter-modality: Visual Parsing with Self-Attention for Vision-and-Language Pre-training.
Hongwei XueYupan HuangBei LiuHouwen PengJianlong FuHouqiang LiJiebo LuoPublished in: NeurIPS (2021)
Keyphrases
- visual field
- natural language
- visual perception
- selective attention
- visual attention
- visual processing
- human vision
- word order
- natural language processing
- visual features
- linguistic analysis
- real time
- language learning
- visual input
- visual languages
- training process
- context dependent
- visual query language
- visual information
- test set
- medical images
- programming language
- computer vision
- training set
- training samples
- biological vision
- pre attentive
- syntactic parsing
- stochastic context free grammars
- language understanding
- language processing
- vision system
- multi modal
- training examples
- hidden markov models
- visual scene
- low level
- context free
- high level
- information extraction
- image processing
- supervised learning
- phrase structure
- machine learning
- neural network