N-gram Fragment Sequence Based Unsupervised Domain-Specific Document Readability.
Shoaib JameelXiaojun QianWai LamPublished in: COLING (2012)
Keyphrases
- n gram
- domain specific
- viterbi algorithm
- language model
- web documents
- bag of words
- word level
- language independent
- text classification
- language modeling
- document retrieval
- document images
- information retrieval
- document collections
- variable length
- document representation
- document ranking
- language modelling
- vector space model
- word segmentation
- information retrieval systems
- relevant documents
- keywords
- character n grams
- part of speech
- query terms
- text documents
- retrieval systems
- semi supervised
- term frequency
- document analysis
- information access
- ad hoc retrieval
- neural network
- knowledge discovery
- statistical language modeling
- inside outside algorithm