Statistical Evaluation of Measure and Distance on Document Classification Problems in Text Mining.
Masayuki GotoTakashi IshidaShigeichi HirasawaPublished in: CIT (2007)
Keyphrases
- text mining
- text documents
- distance measure
- textual documents
- document clustering
- text clustering
- document classification
- information retrieval
- dissimilarity measure
- text classification
- similarity measure
- natural language processing
- web documents
- machine learning
- euclidean distance
- document collections
- information extraction
- keywords
- semantic information
- document images
- distance metric
- sentiment analysis
- text data
- data mining
- distance function
- relevant documents
- document retrieval
- text categorization
- tf idf
- feature selection