How Large Corpora Sizes Influence the Distribution of Low Frequency Text n-grams.
Joaquim F. SilvaJosé C. CunhaPublished in: PAKDD (3) (2024)
Keyphrases
- low frequency
- n gram
- high frequency
- character n grams
- text classification
- language model
- frequency domain
- text data
- wavelet transform
- word level
- bag of words
- language independent
- part of speech
- subband
- variable length
- information retrieval
- text documents
- language specific
- web documents
- keywords
- discrete wavelet transform
- text mining
- wavelet coefficients
- text retrieval
- high resolution
- word segmentation
- frequency band
- electromagnetic fields
- low and high frequency
- high frequency components
- fusion rules
- cross language
- dct coefficients
- machine translation
- data fusion
- information extraction
- data analysis
- high quality
- similarity measure
- image processing