Classifying Web Pages by Genre: An n-Gram Approach.
Jane E. MasonMichael A. ShepherdJack DuffyPublished in: Web Intelligence (2009)
Keyphrases
- n gram
- classifying web pages
- text classification
- bag of words
- language independent
- web documents
- web pages
- naive bayes
- variable length
- text categorization
- feature selection
- language model
- part of speech
- language modeling
- machine learning
- text mining
- language modelling
- word segmentation
- databases
- inside outside algorithm
- knn
- word level
- naive bayes classifier
- character n grams
- neural network