Compressing Semi-Structured Text using Hierarchical Phrase Identification.
Craig G. Nevill-ManningIan H. WittenDan R. OlsenPublished in: Data Compression Conference (1996)
Keyphrases
- semi structured
- free text
- web documents
- text mining
- structured data
- unstructured text
- information extraction
- content and structure
- keywords
- information integration
- data model
- semi structured documents
- semi structured data
- html pages
- textual data
- data extraction
- structured knowledge
- web data
- multiword
- text data
- text documents
- wrapper generation
- data collections
- web data sources
- information retrieval
- html documents
- web data extraction
- database
- web sources
- noun phrases
- logic programs
- natural language processing
- knowledge rich
- unstructured data
- data integration
- knowledge discovery
- data mining