Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision.

Xiaoshi Wu Hadar Averbuch-Elor Jin Sun Noah Snavely

Published in: CoRR (2021)

Keyphrases

image data
three dimensional
image registration
ground truth
learning process
image retrieval
learning algorithm
combining multiple
language acquisition
real time
multiple images
computer vision
lighting conditions
multi modal
image quality
computer vision and graphics
geometric information
image matching
feature points
image analysis
image classification
edge detection
input image
image features
reinforcement learning
natural language
active learning
object recognition