docExtractor: An off-the-shelf historical document element extraction.

Tom Monnier Mathieu Aubry

Published in: CoRR (2020)

Keyphrases

document images
web documents
information extraction
historical data
information retrieval systems
information retrieval
structured documents
document collections
retrieval systems
relevant documents
xml elements
digital documents
language model
data structure
text documents
keywords
vector space model
database
historical documents