VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization.
Yuliang LiuMingxin HuangHao YanLinger DengWeijia WuHao LuChunhua ShenLianwen JinXiang BaiPublished in: CoRR (2024)
Keyphrases
- cross domain
- image features
- image classification
- image content
- image retrieval
- input image
- image representation
- image segmentation
- domain adaptation
- video data
- web images
- knowledge transfer
- image regions
- video sequences
- similarity measure
- information retrieval
- text categorization
- transfer learning
- image collections
- object recognition
- keywords
- textual descriptions
- text mining
- video frames
- key frames
- video content