STAGER checklist: Standardized Testing and Assessment Guidelines for Evaluating Generative AI Reliability.
Jinghong ChenLingxuan ZhuWeiming MouZaoqu LiuQuan ChengAnqi LinJian ZhangPeng LuoPublished in: CoRR (2023)
Keyphrases
- artificial intelligence
- software reliability
- expert systems
- generative model
- ai systems
- knowledge based systems
- machine learning
- data driven
- failure rate
- test data
- ai methods
- intelligent behavior
- intelligent systems
- test generation
- decision support system
- automatic evaluation
- software engineering
- case study
- item response theory
- code coverage