Agreement is overrated: A plea for correlation to assess human evaluation reliability.
Jacopo AmideiPaul PiwekAlistair WillisPublished in: INLG (2019)
Keyphrases
- evaluation criteria
- human subjects
- database
- data sets
- evaluation model
- search algorithm
- correlation coefficient
- evaluation measures
- evaluation method
- reliability assessment
- evaluation process
- human interaction
- highly correlated
- human behavior
- eye movements
- human computer interaction
- digital libraries
- case study
- neural network
- real time