CToolEval: A Chinese Benchmark for LLM-Powered Agent Evaluation in Real-World API Interactions.
Zishan GuoYufei HuangDeyi XiongPublished in: ACL (Findings) (2024)
Keyphrases
- real world
- multi agent systems
- multi agent
- wide range
- comparative analysis
- dynamic environments
- multiagent systems
- interacting agents
- evaluation model
- evaluation method
- autonomous agents
- source code
- data mining
- intelligent agents
- synthetic data
- evaluation criteria
- agent technology
- quantitative evaluation
- high level
- chinese language
- data sets