Segmenting DNA sequence into 'words' based on statistical language model
Wang LiangPublished in: CoRR (2012)
Keyphrases
- language model
- dna sequences
- n gram
- statistical language modeling
- language modeling
- language models for information retrieval
- information retrieval
- probabilistic model
- document representation
- translation model
- document retrieval
- multiword
- dna computing
- speech recognition
- human genome
- retrieval model
- language modelling
- word error rate
- smoothing methods
- word segmentation
- dependency structure
- gene structure prediction
- context sensitive
- test collection
- text classification
- ad hoc information retrieval
- mixture model
- vector space model
- query expansion
- pseudo relevance feedback
- rna sequences
- binding sites
- machine learning
- query terms
- document level
- statistical models
- semantic similarity
- maximum likelihood
- cross lingual
- word recognition