Discourse Type Clustering using POS n-gram Profiles and High-Dimensional Embeddings.
Christelle CoccoPublished in: EACL (2012)
Keyphrases
- n gram
- high dimensional
- high dimensional data
- part of speech
- low dimensional
- language model
- data points
- dimensionality reduction
- bag of words
- text classification
- language independent
- manifold learning
- variable length
- language modelling
- language modeling
- word segmentation
- clustering method
- similarity search
- document clustering
- low dimensional spaces
- artificial intelligence
- inside outside algorithm
- vector space
- nearest neighbor
- k means
- feature space
- natural language
- web documents
- semi supervised
- probabilistic model
- viterbi algorithm
- character n grams
- statistical language modeling