Unsupervised Multiword Segmentation of Large Corpora using Prediction-Driven Decomposition of n-grams.
Julian BrookeVivian TsangGraeme HirstFraser SheinPublished in: COLING (2014)
Keyphrases
- agglomerative clustering
- n gram
- multiword
- part of speech
- language model
- natural language processing
- word segmentation
- language modeling
- bag of words
- language independent
- context sensitive
- text classification
- probabilistic model
- document retrieval
- document representation
- information retrieval
- text collections
- text categorization
- semi supervised
- multiscale