Layout-Aware Semi-automatic Information Extraction for Pharmaceutical Documents.
Simon HarmataKatharina Hofer-SchmitzPhuong-Ha NguyenChristoph QuixBujar BakiuPublished in: DILS (2017)
Keyphrases
- semi automatic
- information extraction
- free text
- text documents
- web documents
- information retrieval
- fully automatic
- unstructured documents
- textual data
- wrapper generation
- gold standard
- semi automatically
- document collections
- text mining
- natural language processing
- information retrieval systems
- precision and recall
- page layout
- natural language text
- named entities
- question answering
- domain ontology
- named entity recognition
- machine learning
- structured data
- labor intensive
- relevant documents
- semantic annotation
- semi structured
- ontology development
- xml documents
- machine translation
- metadata
- web data
- relation extraction
- manual annotation
- web mining
- semantic information
- extraction rules