Statistical Analysis of Web Documents: A Proposal and a Case Study.
Pierpaolo VittoriniPaolino Di FelicePublished in: DEXA Workshop (2001)
Keyphrases
- web documents
- statistical analysis
- semi structured
- web pages
- information extraction
- web search engines
- document classification
- textual information
- web content
- vector space model
- document representation
- link structure
- html documents
- keywords
- web data
- structured documents
- web logs
- information retrieval
- website
- web directories
- focused crawling