TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding.
Bozhi LuanHao FengHong ChenYonghui WangWengang ZhouHouqiang LiPublished in: CoRR (2024)
Keyphrases
- image understanding
- image interpretation
- computer vision
- object recognition
- information retrieval
- computational vision
- image segmentation
- image analysis
- image analysis and computer vision
- control structure
- pattern recognition
- multi modal
- object detection
- image annotation
- image processing
- text documents
- text retrieval
- neural network
- keywords
- high level
- multiple modalities
- real world
- image restoration
- semantic information
- training set