Document Similarity for Texts of Varying Lengths via Hidden Topics.
Hongyu GongTarek SakakiniSuma BhatJinjun XiongPublished in: CoRR (2019)
Keyphrases
- document similarity
- latent dirichlet allocation
- text documents
- document clustering
- topic models
- document representation
- cosine similarity
- graph theory
- keywords
- word similarity
- semantic similarity
- information retrieval
- vector space model
- text mining
- generative model
- relevance model
- information extraction
- index terms
- similarity function
- text data
- high dimensional
- search engine