Solving Cosine Similarity Underestimation between High Frequency Words by L2 Norm Discounting.
Saeth WannasuphoprasitYi ZhouDanushka BollegalaPublished in: CoRR (2023)
Keyphrases
- high frequency
- cosine similarity
- low frequency
- similarity function
- distance measure
- high resolution
- similarity measure
- document clustering
- wavelet transform
- subband
- tf idf
- vector space model
- euclidean distance
- vector space
- n gram
- semantic similarity
- wavelet coefficients
- k means
- high frequency components
- computer vision
- low level
- query processing
- support vector
- multiscale