Extracting article text from the web with maximum subsequence segmentation.
Jeff PasternackDan RothPublished in: WWW (2009)
Keyphrases
- web documents
- text segmentation
- textual data
- text information
- information retrieval and extraction
- data extraction
- level set
- website
- topic segmentation
- web applications
- automatically extracted
- text detection
- segmentation algorithm
- image segmentation
- database
- information sources
- information retrieval
- text extraction
- pattern matching
- digital documents
- web pages
- automatically extracting
- web resources
- text content
- semantic web
- medical images
- text documents
- keywords
- segmentation method
- textual features
- text mining
- multi lingual
- line extraction
- semantic markup
- complex background
- web images
- dynamic time warping
- web mining
- web technologies
- free text
- shape prior
- region growing
- linked data
- energy function
- structured data
- text classification
- edge detection
- information extraction
- image analysis