Detecting near-duplicate text documents with a hybrid approach.
Cihan VarolSairam HariPublished in: J. Inf. Sci. (2015)
Keyphrases
- text documents
- bag of words
- text mining
- text classification
- text categorization
- topic models
- information extraction
- document classification
- keywords
- news articles
- textual information
- wordnet
- named entities
- document clustering
- text data
- tf idf
- automatic text categorization
- text collections
- action recognition
- n gram
- feature selection
- image representation
- generative model
- image classification
- computer vision
- artificial intelligence
- data sets