A Hybrid Statistical Data Pre-processing Approach for Language-Independent Text Classification.
Yanbo J. WangFrans CoenenRobert SandersonPublished in: ADMA (2009)
Keyphrases
- language independent
- text classification
- data pre processing
- feature selection
- n gram
- data analysis
- preprocessing
- text mining
- machine learning
- naive bayes
- text categorization
- bag of words
- data mining
- text documents
- dimension reduction
- cross lingual
- unsupervised learning
- labeled data
- pattern extraction
- unlabeled data
- data cleaning
- knn
- data preparation
- feature space
- artificial intelligence
- data mining process
- databases
- k nearest neighbor
- decision tree algorithm
- machine translation
- natural language
- dimensionality reduction