Toucan: Token-Aware Character Level Language Modeling.
William FleshmanBenjamin Van DurmePublished in: CoRR (2023)
Keyphrases
- language modeling
- language model
- retrieval model
- query expansion
- information retrieval
- n gram
- probabilistic model
- cross lingual
- statistical language models
- text classification
- test collection
- document retrieval
- machine learning
- trec collections
- improvements in retrieval effectiveness
- search engine
- term weighting schemes
- comparable corpora
- sentence retrieval
- document length