answer_match_acc#

This is the metric for QA generation. It compares the predicted answer with the ground truth answer.

Classes

AnswerMatchAcc([type])

Metric for answer matching.

class AnswerMatchAcc(type: Literal['exact_match', 'fuzzy_match'] = 'exact_match')[source]#

Bases: BaseEvaluator

Metric for answer matching. It compares the predicted answer with the ground truth answer.

Parameters:

type (str) – Type of matching evaluation. Can be “exact_match” or “fuzzy_match”. “exact_match” requires the predicted answer to be exactly the same as the ground truth answer. “fuzzy_match” requires the predicted answer to contain the ground truth answer.

Examples

>>> pred_answers = ["positive", "negative", "this is neutral"]
>>> gt_answers = ["positive", "negative", "neutral"]
>>> answer_match_acc = AnswerMatchAcc(type="exact_match")
>>> avg_acc, acc_list = answer_match_acc.compute(all_pred_answer, all_gt_answer)
>>> avg_acc
2 / 3
>>> acc_list
[1.0, 1.0, 0.0]
>>> answer_match_acc = AnswerMatchAcc(type="fuzzy_match")
>>> avg_acc, acc_list = answer_match_acc.compute(all_pred_answer, all_gt_answer)
>>> avg_acc
1.0
>>> acc_list
[1.0, 1.0, 1.0]
compute_single_item(y: object, y_gt: object) float[source]#

Compute the match accuracy of the predicted answer for a single query.

Allow any type of input for pred_answer and gt_answer. When evaluating, the input will be converted to string.

Parameters:
  • pred_answer (object) – Predicted answer.

  • gt_answer (object) – Ground truth answer.

Returns:

Match accuracy.

Return type:

float

compute(pred_answers: List[str], gt_answers: List[str]) EvaluationResult[source]#

Compute the match accuracy of the predicted answer for a list of queries.

Parameters:
  • pred_answers (List[str]) – List of predicted answer strings.

  • gt_answers (List[str]) – List of ground truth answer strings.

Returns:

  • float: Average match accuracy.

  • List[float]: Match accuracy values for each query.

Return type:

tuple