The Influence of Pre-processing on the Estimation of Readability of Web Documents.
João Rafael De Moura PalottiGuido ZucconAllan HanburyPublished in: CIKM (2015)
Keyphrases
- web documents
- preprocessing
- web pages
- semi structured
- web search engines
- document classification
- information extraction
- html documents
- vector space model
- web content
- web data
- focused crawling
- document representation
- keywords
- machine learning
- textual information
- web logs
- link structure
- feature extraction
- semi structured data
- databases