DiT: Self-supervised Pre-training for Document Image Transformer.
Junlong LiYiheng XuTengchao LvLei CuiCha ZhangFuru WeiPublished in: CoRR (2022)
Keyphrases
- document images
- document analysis
- document image analysis
- page layout
- document processing
- document image understanding
- printed documents
- page segmentation
- optical character recognition
- language identification
- word level
- scanned documents
- camera captured document
- image processing
- document layout
- binarization method
- text analysis
- gray scale
- image retrieval