Extraction of the contents in the web texts by content-density distribution.
Saori KitaharaKoya TamuraKenji HatanoPublished in: Int. J. Knowl. Eng. Soft Data Paradigms (2011)
Keyphrases
- web content
- density distribution
- text content
- web information
- web pages
- content extraction
- web documents
- website
- web resources
- textual content
- content similarity
- data extraction
- user generated content
- news pages
- web data
- html pages
- multimedia
- metadata
- web browsing
- information extraction
- density function
- web mining
- keywords
- page contents
- textual features
- arbitrary shape
- learning content
- em algorithm