Extraction of Web Texts Using Content-Density Distribution.
Saori KitaharaKoya TamuraKenji HatanoPublished in: AIRS (2011)
Keyphrases
- density distribution
- web content
- web documents
- user generated content
- web resources
- text content
- website
- data extraction
- web pages
- information extraction
- textual features
- content management
- uniform distribution
- arbitrary shape
- keywords
- multi dimensional
- data streams
- relevant content
- metadata
- web images
- search engine
- data structure
- extraction rules
- news pages