From Historical Newspapers to Machine-Readable Data: The Origami OCR Pipeline.
Bernhard LieblManuel BurghardtPublished in: CHR (2020)
Keyphrases
- data sets
- data collection
- raw data
- databases
- historical data
- experimental data
- training data
- database
- data analysis
- data sources
- image data
- data processing
- high quality
- decision trees
- original data
- data quality
- data structure
- high dimensional data
- synthetic data
- data distribution
- data acquisition
- feature selection
- data objects
- information retrieval
- document processing