READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents.
Tobias GrüningRoger LabahnMarkus DiemFlorian KleberStefan FielPublished in: CoRR (2017)
Keyphrases
- detection scheme
- electronic documents
- information retrieval
- automatic detection
- object detection
- relevant documents
- document collections
- detection method
- digital libraries
- evaluation method
- false positives
- database
- web documents
- information retrieval systems
- relevance judgements
- xml documents
- detection rate
- keywords
- document classification
- vector space model
- synthetic datasets
- relevance assessments
- rule interestingness measures
- multi document summarization
- relevance judgments
- digital objects
- document clustering
- semantic information
- natural language processing
- metadata