answer_match_acc¶

This is the metric for QA generation. It compares the predicted answer with the ground truth answer.

Classes

AnswerMatchAcc([type])

Metric for answer matching.

class AnswerMatchAcc(type: Literal['exact_match', 'fuzzy_match', 'rouge_score', 'bleu_score', 'bert_score', 'f1_score'] = 'exact_match')[source]¶

Bases: BaseEvaluator

Metric for answer matching. It compares the predicted answer with the ground truth answer.

Parameters:: type (str) – Type of matching evaluation. Can be “exact_match” or “fuzzy_match”. “exact_match” requires the predicted answer to be exactly the same as the ground truth answer. “fuzzy_match” requires the predicted answer to contain the ground truth answer.

Examples

>>> pred_answers = ["positive", "negative", "this is neutral"]
>>> gt_answers = ["positive", "negative", "neutral"]
>>> answer_match_acc = AnswerMatchAcc(type="exact_match")
>>> avg_acc, acc_list = answer_match_acc.compute(all_pred_answer, all_gt_answer)
>>> avg_acc
2 / 3
>>> acc_list
[1.0, 1.0, 0.0]
>>> answer_match_acc = AnswerMatchAcc(type="fuzzy_match")
>>> avg_acc, acc_list = answer_match_acc.compute(all_pred_answer, all_gt_answer)
>>> avg_acc
1.0
>>> acc_list
[1.0, 1.0, 1.0]

References: 1. HotpotQA: https://github.com/hotpotqa/hotpot/blob/master/hotpot_evaluate_v1.py

compute_single_item(y: object, y_gt: object) → float[source]¶

Compute the match accuracy of the predicted answer for a single query.

Allow any type of input for pred_answer and gt_answer. When evaluating, the input will be converted to string.

Parameters:

pred_answer (object) – Predicted answer.
gt_answer (object) – Ground truth answer.

Returns:

Match accuracy.

Return type:

float

compute(pred_answers: List[str], gt_answers: List[str]) → EvaluationResult[source]¶

Compute the match accuracy of the predicted answer for a list of queries.

Parameters:

pred_answers (List[str]) – List of predicted answer strings.
gt_answers (List[str]) – List of ground truth answer strings.

Returns:

float: Average match accuracy.
List[float]: Match accuracy values for each query.

Return type:

tuple