Mining web content outliers using structure oriented weighting techniques and N-grams.
Malik AgyemangKen BarkerReda AlhajjPublished in: SAC (2005)
Keyphrases
- web content
- n gram
- website
- language model
- knowledge discovery
- web documents
- text classification
- variable length
- web pages
- text mining
- bag of words
- outlier detection
- web data
- user generated
- inside outside algorithm
- language modelling
- character n grams
- databases
- language independent
- language modeling
- social media
- part of speech
- knn
- data mining