Generation of a Large-Scale Line Image Dataset with Ground Truth Texts from Page-Level Autograph Documents.
Ayumu NagaiPublished in: ICONIP (1) (2021)
Keyphrases
- image dataset
- ground truth
- keywords
- text documents
- image database
- website
- automatic classification
- information retrieval
- document collections
- page layout
- web pages
- text mining
- text classification
- electronic documents
- natural language text
- image collections
- web documents
- information extraction
- high quality
- image annotation
- higher level
- xml documents
- object recognition
- document type
- computer vision