Text Content Reliability Estimation in Web Documents: A New Proposal.
Luis SanzHéctor AllendeMarcelo MendozaPublished in: CICLing (2) (2012)
Keyphrases
- web documents
- text content
- web pages
- keywords
- focused crawling
- semi structured
- website
- user generated
- textual information
- web content
- search engine
- information extraction
- web search
- html documents
- web search engines
- document classification
- web data
- vector space model
- topic modeling
- text corpus
- user queries
- link analysis
- n gram
- web graph
- data analysis
- anchor text
- text classification
- topic specific
- information retrieval systems
- text mining