TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding.

Bozhi Luan Hao Feng Hong Chen Yonghui Wang Wengang Zhou Houqiang Li

Published in: CoRR (2024)

Keyphrases

image understanding
image interpretation
computer vision
object recognition
information retrieval
computational vision
image segmentation
image analysis
image analysis and computer vision
control structure
pattern recognition
multi modal
object detection
image annotation
image processing
text documents
text retrieval
neural network
keywords
high level
multiple modalities
real world
image restoration
semantic information
training set