BERTTM: Leveraging Contextualized Word Embeddings from Pre-trained Language Models for Neural Topic Modeling.
Zheng FangYulan HeRob ProcterPublished in: CoRR (2023)
Keyphrases
- topic modeling
- language model
- pre trained
- n gram
- topic models
- language modeling
- information retrieval
- probabilistic model
- statistical language modeling
- text classification
- training data
- latent dirichlet allocation
- document retrieval
- text mining
- training examples
- neural network
- retrieval model
- language modeling framework
- collaborative filtering
- co occurrence
- speech recognition
- query expansion
- text documents
- bag of words
- test collection
- vector space model
- vector space
- language independent
- search engine
- query terms
- latent variables
- cross lingual
- keywords
- support vector
- high dimensional data
- machine learning
- data mining
- low dimensional
- data sets