A hybrid approach for extracting informative content from web pages.
Erdinç UzunHayri Volkan AgunTarik YerlikayaPublished in: Inf. Process. Manag. (2013)
Keyphrases
- web pages
- web content
- textual content
- web documents
- dynamic content
- website
- data extraction
- search engine
- information content
- dynamically generated
- hyperlink structure
- web resources
- data records
- web portals
- web server
- web search engines
- keywords
- multimedia content
- text content
- html pages
- content features
- related web pages
- browsing experience
- page content
- geographical locations
- web page classification
- web crawler
- plain text
- web spam
- topic specific
- web browser
- web data
- multimedia
- social media