docExtractor: An off-the-shelf historical document element extraction.
Tom MonnierMathieu AubryPublished in: ICFHR (2020)
Keyphrases
- information retrieval
- automatic extraction
- document images
- document clustering
- historical documents
- web documents
- machine learning
- historical data
- document collections
- document classification
- document processing
- text documents
- information extraction
- document retrieval
- keywords
- website
- structured documents
- user queries
- tf idf
- document structure
- automatically extracting
- semantic information
- web search