Noise Elimination from the Web Documents by Using URL Paths and Information Redundancy.
Byeong Ho KangYang Sok KimPublished in: IKE (2006)
Keyphrases
- web documents
- information redundancy
- noise elimination
- web pages
- edge detection
- mathematical morphology
- image quality
- keywords
- semi structured
- web search engines
- information extraction
- mutual information
- link structure
- html documents
- web data
- vector space model
- structured documents
- web directories
- high quality
- topic specific
- anchor text
- web content
- document representation