An Empirical Study of LLM-as-a-Judge for LLM Evaluation: Fine-tuned Judge Models are Task-specific Classifiers.
Hui HuangYingqi QuJing LiuMuyun YangTiejun ZhaoPublished in: CoRR (2024)
Keyphrases
- fine tuned
- classification models
- statistical models
- decision trees
- training data
- hierarchical models
- machine learning approaches
- trained classifiers
- test set
- parameter estimation
- probabilistic model
- statistical model
- experimental data
- markov random field
- ensemble learning
- classification rate
- supervised classification
- feature set
- data points
- neural network