Structural analysis and regular expressions based noise elimination from web pages for web content mining.
Amit DuttaSudipta PariaTanmoy GoluiDipak K. KolePublished in: ICACCI (2014)
Keyphrases
- web content mining
- regular expressions
- structural analysis
- noise elimination
- web pages
- pattern matching
- web mining
- mathematical morphology
- web data extraction
- query language
- xml schema
- edge detection
- image processing
- three dimensional
- matching algorithm
- semistructured data
- regular path queries
- query evaluation
- search engine
- web content
- xml documents
- web data
- image analysis
- website
- databases
- binary images
- conceptual model
- co occurrence
- web usage mining
- keywords
- feature extraction