Basic semantic units based web page content extraction.
Jingqi WangQingcai ChenXiaolong WangHongzhi GuoPublished in: SMC (2008)
Keyphrases
- content extraction
- web pages
- html documents
- text content
- semantic information
- website
- web news
- web documents
- semantic features
- digital archives
- search engine
- web data
- natural language
- domain knowledge
- semantic similarity
- multimedia information retrieval
- semantically related
- database
- information extraction
- high level
- domain ontology
- wordnet
- semantic web
- text classification