Modeling Documents for Structure Recognition Using Generalized N-Grams.
Rolf BruggerAbdel Wahab ZramdiniRolf IngoldPublished in: ICDAR (1997)
Keyphrases
- n gram
- language model
- character n grams
- text documents
- web documents
- text classification
- variable length
- relevance ranking
- bag of words
- information retrieval systems
- relevant documents
- language independent
- document collections
- document retrieval
- part of speech
- language modelling
- inside outside algorithm
- information retrieval
- language modeling
- image classification
- xml documents
- document ranking
- association rules
- keywords
- statistical language modeling
- data mining