docExtractor: An off-the-shelf historical document element extraction.

Tom Monnier Mathieu Aubry

Published in: ICFHR (2020)

Keyphrases

information retrieval
automatic extraction
document images
document clustering
historical documents
web documents
machine learning
historical data
document collections
document classification
document processing
text documents
information extraction
document retrieval
keywords
website
structured documents
user queries
tf idf
document structure
automatically extracting
semantic information
web search