Bench-Marking Information Extraction in Semi-Structured Historical Handwritten Records.
Animesh PrasadHervé DéjeanJean-Luc MeunierMax WeidemannJohannes MichaelGundram LeifertPublished in: CoRR (2018)
Keyphrases
- semi structured
- information extraction
- historical documents
- structured data
- free text
- data extraction
- natural language processing
- web documents
- text mining
- handwriting recognition
- character recognition
- web data
- historical manuscripts
- historical information
- semi structured data
- text documents
- information retrieval
- information integration
- semi structured documents
- text processing
- wrapper generation
- databases
- data model
- word spotting
- word recognition
- machine learning
- xml databases
- web data extraction
- unstructured data
- unstructured text
- knowledge rich
- textual data
- web sources
- web mining
- natural language
- content and structure
- html documents
- data collections
- expert systems
- web data sources
- database
- structured knowledge
- website