Classification of heterogeneous text data for robust domain-specific language modeling.
Ján StasJozef JuhárDaniel HládekPublished in: EURASIP J. Audio Speech Music. Process. (2014)
Keyphrases
- language modeling
- text data
- text classification
- language model
- n gram
- text mining
- text clustering
- retrieval model
- machine learning
- text documents
- bag of words
- feature selection
- text classifiers
- naive bayes
- query expansion
- decision trees
- pattern recognition
- image classification
- knn
- text categorization
- probabilistic model
- structured data
- feature extraction
- database systems
- supervised learning
- unsupervised learning
- semi supervised
- labeled data
- image retrieval
- training set