Machine Learning of Generalized Document Templates for Data Extraction.
Janusz WnekPublished in: Document Analysis Systems (2002)
Keyphrases
- data extraction
- machine learning
- information extraction
- html pages
- semi structured
- web data extraction
- web documents
- data integration
- web pages
- information retrieval
- web sources
- natural language processing
- document collections
- information retrieval systems
- text mining
- html documents
- retrieval systems
- data analysis
- data mining
- distributed systems
- semantic information
- data sets
- artificial intelligence
- databases