VLDeformer: Learning Visual-Semantic Embeddings by Vision-Language Transformer Decomposing.
Lisai ZhangHongfa WuQingcai ChenYimeng DengZhonghua LiDejiang KongZhao CaoJoanna SiebertYunpeng HanPublished in: CoRR (2021)
Keyphrases
- learning algorithm
- learning process
- natural language
- computer vision
- knowledge acquisition
- learning systems
- context dependent
- real time
- learning tasks
- language acquisition
- visual perception
- background knowledge
- supervised learning
- reinforcement learning
- semantic web
- language learning
- prior knowledge
- positive examples
- neural network