CToolEval: A Chinese Benchmark for LLM-Powered Agent Evaluation in Real-World API Interactions.

Zishan Guo Yufei Huang Deyi Xiong

Published in: ACL (Findings) (2024)

Keyphrases

real world
multi agent systems
multi agent
wide range
comparative analysis
dynamic environments
multiagent systems
interacting agents
evaluation model
evaluation method
autonomous agents
source code
data mining
intelligent agents
synthetic data
evaluation criteria
agent technology
quantitative evaluation
high level
chinese language
data sets