Detecting near-duplicate documents using sentence-level features and supervised learning.
Yung-Shen LinTing-Yi LiaoShie-Jue LeePublished in: Expert Syst. Appl. (2013)
Keyphrases
- sentence level
- supervised learning
- document level
- multi document summarization
- co occurrence
- sentiment analysis
- feature vectors
- information retrieval systems
- novelty detection
- feature space
- image features
- learning algorithm
- web documents
- text documents
- language model
- document clustering
- document collections
- training data
- feature selection
- search engine