TencentLLMEval: A Hierarchical Evaluation of Real-World Capabilities for Human-Aligned LLMs.
Shuyi XieWenlin YaoYong DaiShaobo WangDonlin ZhouLifeng JinXinhua FengPengzhi WeiYujie LinZhichao HuDong YuZhengyou ZhangJing NieYuhong LiuPublished in: CoRR (2023)
Keyphrases
- real world
- data mining
- wide range
- database
- neural network
- human subjects
- data sets
- evaluation criteria
- human behavior
- human beings
- evaluation methods
- human users
- evaluation metrics
- evaluation measures
- human body
- synthetic data
- real life
- bayesian networks
- database systems
- feature selection
- information retrieval
- databases