answer_match_acc#
This is the metric for QA generation. It compares the predicted answer with the ground truth answer.
Classes
|
Metric for answer matching. |
- class AnswerMatchAcc(type: Literal['exact_match', 'fuzzy_match'] = 'exact_match')[source]#
Bases:
BaseEvaluator
Metric for answer matching. It compares the predicted answer with the ground truth answer.
- Parameters:
type (str) – Type of matching evaluation. Can be “exact_match” or “fuzzy_match”. “exact_match” requires the predicted answer to be exactly the same as the ground truth answer. “fuzzy_match” requires the predicted answer to contain the ground truth answer.
Examples
>>> pred_answers = ["positive", "negative", "this is neutral"] >>> gt_answers = ["positive", "negative", "neutral"] >>> answer_match_acc = AnswerMatchAcc(type="exact_match") >>> avg_acc, acc_list = answer_match_acc.compute(all_pred_answer, all_gt_answer) >>> avg_acc 2 / 3 >>> acc_list [1.0, 1.0, 0.0] >>> answer_match_acc = AnswerMatchAcc(type="fuzzy_match") >>> avg_acc, acc_list = answer_match_acc.compute(all_pred_answer, all_gt_answer) >>> avg_acc 1.0 >>> acc_list [1.0, 1.0, 1.0]
- compute_single_item(y: object, y_gt: object) float [source]#
Compute the match accuracy of the predicted answer for a single query.
Allow any type of input for pred_answer and gt_answer. When evaluating, the input will be converted to string.
- Parameters:
pred_answer (object) – Predicted answer.
gt_answer (object) – Ground truth answer.
- Returns:
Match accuracy.
- Return type:
float
- compute(pred_answers: List[str], gt_answers: List[str]) EvaluationResult [source]#
Compute the match accuracy of the predicted answer for a list of queries.
- Parameters:
pred_answers (List[str]) – List of predicted answer strings.
gt_answers (List[str]) – List of ground truth answer strings.
- Returns:
float: Average match accuracy.
List[float]: Match accuracy values for each query.
- Return type:
tuple