Document Classification Using Word2Vec and Chi-square on Apache Spark.
Mijin ChoiRize JinTae-Sun ChungPublished in: CSA/CUTE (2016)
Keyphrases
- document classification
- chi square
- term frequency
- text categorization
- text classification
- information gain
- text documents
- logistic regression
- web documents
- text mining
- mutual information
- classification algorithm
- n gram
- feature selection
- correlation coefficient
- confidence intervals
- knn
- tf idf
- decision trees
- co occurrence
- naive bayes
- data sets
- natural language processing
- information extraction
- document clustering
- labeled data
- topic models
- k nearest neighbor
- probabilistic model
- feature space
- support vector
- similarity measure
- web pages
- neural network