Solving Cosine Similarity Underestimation between High Frequency Words by \ell₂ Norm Discounting.
Saeth WannasuphoprasitYi ZhouDanushka BollegalaPublished in: ACL (Findings) (2023)
Keyphrases
- high frequency
- cosine similarity
- low frequency
- ell norm
- high resolution
- similarity measure
- tf idf
- distance measure
- wavelet transform
- similarity function
- subband
- low rank
- keywords
- text documents
- vector space model
- document clustering
- vector space
- euclidean distance
- semantic similarity
- data sets
- k means
- information retrieval
- wavelet coefficients
- support vector machine
- data representation
- computational complexity
- bayesian networks