Record-Boundary Discovery in Web Documents.
David W. EmbleyY. S. JiangYiu-Kai NgPublished in: SIGMOD Conference (1999)
Keyphrases
- web documents
- information extraction
- document classification
- web search engines
- web pages
- semi structured
- keywords
- web data
- vector space model
- web content
- knowledge discovery
- database
- html documents
- data mining
- geographic information
- focused crawling
- machine learning
- structured documents
- unstructured documents
- web logs
- natural language
- website
- metadata
- information retrieval
- databases