Filtering Very Similar Text Documents: A Case Study.
Jirí HrozaJan ZizkaAles BourekPublished in: CICLing (2004)
Keyphrases
- text documents
- text mining
- text classification
- text categorization
- information extraction
- news articles
- keywords
- wordnet
- topic models
- textual information
- document classification
- document clustering
- tf idf
- named entities
- bag of words
- automatic text categorization
- text corpus
- text collections
- question answering
- information extraction systems
- artificial intelligence
- real world
- natural language processing
- relevant concepts
- machine learning
- information retrieval
- text data
- learning algorithm
- image processing
- face recognition
- feature space
- training set
- feature vectors