TwHIN-BERT: A Socially-Enriched Pre-trained Language Model for Multilingual Tweet Representations at Twitter.
Xinyang ZhangYury MalkovOmar FlorezSerim ParkBrian McWilliamsJiawei HanAhmed El-KishkyPublished in: KDD (2023)
Keyphrases
- language model
- pre trained
- social media
- language modeling
- micro blogging
- word clouds
- cross lingual
- n gram
- training data
- document retrieval
- information retrieval
- probabilistic model
- retrieval model
- twitter users
- speech recognition
- ad hoc information retrieval
- training examples
- online social networks
- social networking
- query expansion
- mixture model
- cross language
- relevance model
- social networks
- test collection
- control signals
- query terms
- data sets
- named entities
- smoothing methods
- cross language information retrieval
- translation model
- neural network
- small number
- supervised learning