Information Extraction in Domain and Generic Documents: Findings from Heuristic-based and Data-driven Approaches.
Shiyu YuanCarlo LipizziPublished in: CoRR (2023)
Keyphrases
- information extraction
- data driven approaches
- free text
- information retrieval
- web documents
- text documents
- domain specific
- unstructured documents
- natural language processing
- information retrieval systems
- text mining
- document collections
- precision and recall
- textual data
- document clustering
- metadata
- xml documents
- named entities
- domain independent
- machine learning
- legal documents
- unstructured text
- case study
- document retrieval
- keywords
- relevant documents
- conditional random fields
- vector space model
- document representation
- semantic information
- text analysis
- natural language text
- structured data
- natural language
- web search
- knowledge discovery