Distributed Classification of Text Documents on Apache Spark Platform.
Piotr SembereckiHenryk MaciejewskiPublished in: ICAISC (1) (2016)
Keyphrases
- text documents
- text classification
- document classification
- text mining
- text clustering
- automatic text categorization
- information extraction
- text categorization
- keywords
- wordnet
- document clustering
- machine learning
- classification accuracy
- text data
- feature extraction
- named entities
- decision trees
- supervised learning
- bag of words
- classification algorithm
- feature selection
- text classifiers
- reinforcement learning
- topic models
- image classification
- pairwise
- n gram
- unsupervised learning
- maximum likelihood
- search engine
- image segmentation
- training set
- databases