PyBench: Evaluating LLM Agent on various real-world coding tasks.

Yaolun Zhang Yinxu Pan Yudong Wang Jie Cai Zhi Zheng Guoyang Zeng Zhiyuan Liu

Published in: CoRR (2024)

Keyphrases

real world
multi agent systems
case study
intelligent agents
multiagent systems
wide range
coding scheme
multi agent
dynamic environments
agent systems
autonomous agents
agent architecture
mental imagery
decision theoretic
multi task
software agents
mobile agents
agent model
pedagogical agents
multiple agents
multiple tasks
multiscale