docExtractor: An off-the-shelf historical document element extraction.
Tom MonnierMathieu AubryPublished in: CoRR (2020)
Keyphrases
- document images
- web documents
- information extraction
- historical data
- information retrieval systems
- information retrieval
- structured documents
- document collections
- retrieval systems
- relevant documents
- xml elements
- digital documents
- language model
- data structure
- text documents
- keywords
- vector space model
- database
- historical documents