Synthetic Data Generation with Large Language Models for Text Classification: Potential and Limitations.
Zhuoyan LiHangxiao ZhuZhuoran LuMing YinPublished in: EMNLP (2023)
Keyphrases
- language model
- data generation
- text classification
- language modeling
- n gram
- co training
- probabilistic model
- language modelling
- information retrieval
- retrieval model
- document retrieval
- statistical language modeling
- bag of words
- test collection
- query expansion
- active learning
- naive bayes
- text categorization
- context sensitive
- smoothing methods
- machine learning
- text mining
- text documents
- feature selection
- statistical language models
- language models for information retrieval
- data streams
- translation model
- high throughput
- query processing
- text classifiers
- knn
- labeled data