Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision.
Xiaoshi WuHadar Averbuch-ElorJin SunNoah SnavelyPublished in: CoRR (2021)
Keyphrases
- image data
- three dimensional
- image registration
- ground truth
- learning process
- image retrieval
- learning algorithm
- combining multiple
- language acquisition
- real time
- multiple images
- computer vision
- lighting conditions
- multi modal
- image quality
- computer vision and graphics
- geometric information
- image matching
- feature points
- image analysis
- image classification
- edge detection
- input image
- image features
- reinforcement learning
- natural language
- active learning
- object recognition