TwHIN-BERT: A Socially-Enriched Pre-trained Language Model for Multilingual Tweet Representations.
Xinyang ZhangYury MalkovOmar FlorezSerim ParkBrian McWilliamsJiawei HanAhmed El-KishkyPublished in: CoRR (2022)
Keyphrases
- language model
- pre trained
- language modeling
- word clouds
- cross lingual
- n gram
- speech recognition
- query expansion
- retrieval model
- information retrieval
- training examples
- document retrieval
- training data
- probabilistic model
- ad hoc information retrieval
- test collection
- smoothing methods
- context sensitive
- named entities
- cross language
- query terms
- bag of words
- translation model
- mixture model
- sentiment analysis
- face recognition
- cross language information retrieval
- query specific
- machine learning
- learning algorithm
- feature selection
- data sets
- action recognition
- decision trees
- text mining
- information extraction
- active learning
- training set