Main Content Extraction from Web Documents Using Text Block Context.
Myungwon KimYoungjin KimWonmoon SongAra KhilPublished in: DEXA (2) (2013)
Keyphrases
- web documents
- information extraction
- web pages
- keywords
- web search engines
- semi structured
- link structure
- textual information
- information retrieval
- extraction rules
- html documents
- contextual information
- vector space model
- text documents
- web content
- document classification
- document representation
- automatic extraction
- web data
- multi view
- web search
- search engine
- unstructured text
- machine learning
- wrapper induction