The Lehigh Steel Collection: a new open dataset for document recognition research.
Barri BrunoDaniel P. LoprestiPublished in: DRR (2014)
Keyphrases
- document collections
- document analysis
- recognition rate
- recognition accuracy
- pattern recognition
- object recognition
- information retrieval systems
- database
- information retrieval
- document clustering
- activity recognition
- character recognition
- printed documents
- text collections
- document images
- web documents
- text documents
- document retrieval
- document classification
- recognition algorithm
- visual recognition
- document set
- human activities
- retrieval systems
- text classification
- trec genomics
- relevant documents
- cf loadingtexthtml
- search engine
- related documents
- word recognition
- text lines
- handwritten digits
- feature extraction
- keywords
- recognition process
- structured documents
- semantic information
- synthetic datasets
- vector space model
- user queries