An Apache Spark Implementation for Text Document Clustering.
Elias DritsasMaria TrigkaGerasimos VonitsanosAndreas KanavosPhivos MylonasPublished in: SMAP (2022)
Keyphrases
- document clustering
- text documents
- text mining
- text clustering
- automatic categorization
- document categorization
- document corpus
- topic detection
- document representation
- clustering algorithm
- negative matrix factorization
- document collections
- keywords
- vector space model
- text data
- web server
- document classification
- cluster analysis
- topic models
- text classification
- text retrieval
- natural language processing
- information extraction
- document clusters
- pairwise
- databases
- active learning
- automatic summarization
- website