Review - Record-Boundary Discovery in Web Documents.
M. Tamer ÖzsuPublished in: ACM SIGMOD Digit. Rev. (2000)
Keyphrases
- web documents
- information extraction
- semi structured
- keywords
- web pages
- document classification
- web search engines
- knowledge discovery
- structured documents
- web content
- textual information
- link structure
- focused crawling
- database
- web logs
- html documents
- geographic information
- social annotations
- web search
- natural language processing
- vector space model
- document representation