Information Extraction from Semi-structured WEB Page Based on DOM Tree and its Application in Scientific Literature Statistical Analysis System.
Weidong LiYibing DongRuiJiang WangHongxia TianPublished in: SSME (2009)
Keyphrases
- semi structured
- dom tree
- scientific literature
- information extraction
- statistical analysis
- web documents
- text mining
- html documents
- web data extraction
- text processing
- unstructured text
- data extraction
- natural language processing
- structured data
- information retrieval
- web pages
- web data
- biomedical literature
- information integration
- digital libraries
- semi structured data
- semistructured data
- text classification
- machine learning
- web information extraction
- textual data
- text documents
- named entities
- natural language
- web mining
- web content
- data sets
- topic models
- keywords
- data mining
- real world
- data model
- artificial intelligence
- search engine
- document clustering
- statistical methods
- database systems
- xml documents