PyBench: Evaluating LLM Agent on various real-world coding tasks.
Yaolun ZhangYinxu PanYudong WangJie CaiZhi ZhengGuoyang ZengZhiyuan LiuPublished in: CoRR (2024)
Keyphrases
- real world
- multi agent systems
- case study
- intelligent agents
- multiagent systems
- wide range
- coding scheme
- multi agent
- dynamic environments
- agent systems
- autonomous agents
- agent architecture
- mental imagery
- decision theoretic
- multi task
- software agents
- mobile agents
- agent model
- pedagogical agents
- multiple agents
- multiple tasks
- multiscale