Document Similarity Self-Join with MapReduce.
Ranieri BaragliaGianmarco De Francisci MoralesClaudio LucchesePublished in: ICDM (2010)
Keyphrases
- document similarity
- graph theory
- document clustering
- cosine similarity
- document representation
- word similarity
- semantic similarity
- latent dirichlet allocation
- relevance model
- vector space model
- index terms
- similarity measure
- similarity function
- vector space
- image content
- text documents
- document collections
- digital libraries