Improving the performance of Naive Bayes multinomial in e-mail foldering by introducing distribution-based balance of datasets.
Pablo BermejoJosé A. GámezJosé Miguel PuertaPublished in: Expert Syst. Appl. (2011)
Keyphrases
- naive bayes
- uci datasets
- decision trees
- classification accuracy
- text classification
- logistic regression
- bayesian networks
- naive bayes classifier
- classification algorithm
- text categorization
- probability estimation
- training data
- feature selection
- cost sensitive
- uci data sets
- test instances
- text classifiers
- bayesian classifier
- benchmark datasets
- probability distribution
- base classifiers
- conditional independence assumption
- text mining
- bayesian network classifiers
- naive bayesian classifier
- random variables
- probabilistic classifiers
- data sets
- augmented naive bayes
- averaged one dependence estimators
- independence assumption
- probabilistic model
- information retrieval