Mining Numbers in Text Using Suffix Arrays and Clustering Based on Dirichlet Process Mixture Models.
Minoru YoshidaIssei SatoHiroshi NakagawaAkira TeradaPublished in: PAKDD (2) (2010)
Keyphrases
- suffix array
- dirichlet process mixture models
- string matching
- text mining
- space efficient
- text retrieval
- data structure
- suffix tree
- information retrieval
- inverted file
- keywords
- text documents
- data compression
- similarity measurement
- bayesian model
- pattern matching
- test collection
- machine learning
- information extraction