NextLevelBERT: Investigating Masked Language Modeling with Higher-Level Representations for Long Documents.
Tamara CzinczollChristoph HönesMaximilian SchallGerard de MeloPublished in: CoRR (2024)
Keyphrases
- language modeling
- higher level
- information retrieval
- language model
- expert finding
- language modeling approaches
- retrieval model
- query expansion
- improvements in retrieval effectiveness
- document retrieval
- low level
- trec collections
- relevance model
- information retrieval systems
- query terms
- vector space model
- pseudo feedback
- document collections
- document length
- relevant documents
- cross lingual
- ad hoc information retrieval
- n gram
- probabilistic model
- term weighting
- text classification
- term dependencies
- statistical language models
- document ranking
- language modeling framework
- high level
- test collection
- text documents
- web documents
- retrieved documents
- expert search
- term weighting schemes
- document representation
- sentence retrieval
- word segmentation
- term frequency
- comparable corpora
- translation model
- relevance feedback
- vector space
- retrieval systems
- user queries
- digital libraries
- query specific
- text corpora
- keywords
- machine learning
- retrieval effectiveness