Classification and Clustering of arXiv Documents, Sections, and Abstracts, Comparing Encodings of Natural and Mathematical Language.
Philipp ScharpfMoritz SchubotzAbdou YoussefFelix HamborgNorman MeuschkeBela GippPublished in: JCDL (2020)
Keyphrases
- document clustering
- document classification
- text clustering
- clustering algorithm
- unsupervised learning
- unsupervised clustering
- text classification
- pre classified
- automatic categorization
- document categorization
- clustering analysis
- clustering method
- supervised classification
- classification accuracy
- image classification
- k means
- pattern recognition
- support vector
- programming language
- xml documents
- machine learning
- classification algorithm
- unsupervised classification
- web documents
- high dimensionality
- feature extraction
- decision trees
- feature selection
- information retrieval
- cluster analysis
- document retrieval
- text documents
- relevant documents
- automatic classification
- supervised learning
- text classifiers
- target language
- support vector machine
- training set
- training data
- multilingual documents