Text document clustering based on frequent word sequences.
Yanjun LiSoon Myoung ChungPublished in: CIKM (2005)
Keyphrases
- text documents
- keywords
- text corpus
- text mining
- term frequency
- text categorization
- text classification
- information extraction
- frequency counts
- topic models
- document classification
- tf idf
- hidden markov models
- wordnet
- co occurrence
- n gram
- textual information
- document clustering
- frequent patterns
- document representation
- event sequences
- frequent sequences
- bag of words
- sequential patterns
- named entities
- text collections
- text databases
- neural network
- machine learning
- natural language processing
- active learning
- text corpora
- feature vectors
- information retrieval