Text vectorization via transformer-based language models and n-gram perplexities.
Mihailo SkoricPublished in: CoRR (2023)
Keyphrases
- n gram
- language model
- information retrieval
- language modeling
- character n grams
- document level
- language modelling
- probabilistic model
- document retrieval
- retrieval model
- text retrieval
- language independent
- word level
- part of speech
- speech recognition
- test collection
- bag of words
- query expansion
- context sensitive
- pseudo relevance feedback
- translation model
- web documents
- text mining
- keywords
- out of vocabulary
- statistical language modeling
- machine learning
- document analysis
- relevance model
- vector space model
- text documents
- text classification
- data mining
- sentence level
- query terms