Login / Signup
An N-Gram Based Approach to Automatically Identifying Web Page Genre.
Jane E. Mason
Michael A. Shepherd
Jack Duffy
Published in:
HICSS (2009)
Keyphrases
</>
n gram
web documents
web pages
language model
language modelling
language independent
text classification
website
bag of words
language modeling
variable length
search engine
word segmentation
viterbi algorithm
part of speech
semi structured
query expansion
web search
information retrieval