Efficient Approach for Repeated Patterns Mining Based on Indent Shape of HTML Documents.
Yanxu ZhuGang YinHuaimin WangDianxi ShiXiang RaoLin YuanPublished in: CyberC (2011)
Keyphrases
- repeated patterns
- html documents
- web documents
- semi structured
- tree matching
- automatic extraction
- significant features
- semantic information
- data mining
- web content
- semistructured data
- web pages
- text mining
- knowledge discovery
- string matching
- structured documents
- machine learning
- domain knowledge
- topic maps
- pattern matching
- knowledge base