Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision.
Xiaoshi WuHadar Averbuch-ElorJin SunNoah SnavelyPublished in: ICCV (2021)
Keyphrases
- three dimensional
- image database
- learning algorithm
- input image
- image analysis
- reinforcement learning
- image data
- image features
- geometric constraints
- real time
- learning process
- object recognition
- image retrieval
- feature points
- computer vision and graphics
- language acquisition
- lighting conditions
- image segmentation
- face recognition
- supervised learning
- image processing
- programming language
- image classification
- multi modal
- language learning
- image collections
- computer vision
- geometric structure
- object oriented programming
- ground truth
- multiresolution