Automatic Training Set Generation for Better Historic Document Transcription and Compression.
Gabriel de França Pereira e SilvaRafael Dueire LinsCesar GomesPublished in: Document Analysis Systems (2014)
Keyphrases
- training set
- test set
- semi automatic
- training data
- active learning
- inverted lists
- image compression
- information retrieval systems
- nearest neighbor
- information retrieval
- generation process
- data sets
- data compression
- relevant documents
- document images
- cross validation
- document collections
- test data
- document retrieval
- supervised learning
- keywords
- retrieval systems
- training samples
- document clustering
- compression algorithm
- compression ratio
- image quality
- query processing