RougeL score based on word overlap
Extractiveness score based on passage overlap
Recall score based on word overlap
Length in characters
Bert-KPrec score based on word overlap
Accuracy score for predicted vs gold class for answerability
The response is coherent, natural, and not dismissive.
Annotator feedback explaining their score for fluency
The response provides appropriate amount of useful information.
Annotator feedback explaining their score for answer relevance
The response is faithful and grounded on the context.
Annotator feedback explaining their score for faithfulness
Number of times this model response is preferred over other model responses from the same task.