A repetition based measure for verification of text collections and for text categorization.
Dmitry V. KhmelevWilliam John TeahanPublished in: SIGIR (2003)
Keyphrases
- text categorization
- text collections
- text documents
- text classification
- feature selection
- knn
- similarity measure
- k nearest neighbor
- textual data
- digital libraries
- support vector machine
- text classifiers
- test collection
- nearest neighbor
- data sets
- high dimensional
- knowledge base
- artificial intelligence
- information retrieval