Synthetic Data Generation with Large Language Models for Text Classification: Potential and Limitations.
Zhuoyan LiHangxiao ZhuZhuoran LuMing YinPublished in: CoRR (2023)
Keyphrases
- language model
- data generation
- text classification
- language modeling
- n gram
- co training
- probabilistic model
- language modelling
- document retrieval
- information retrieval
- statistical language models
- test collection
- retrieval model
- bag of words
- text mining
- smoothing methods
- statistical language modeling
- query expansion
- active learning
- naive bayes
- feature selection
- unlabeled data
- data streams
- high throughput
- information extraction
- text classifiers
- knn
- context sensitive
- machine learning
- k nearest neighbor
- translation model
- streaming data
- semi supervised
- language models for information retrieval