retriever_recall¶

Retriever Recall @k metric.

Classes

RetrieverEvaluator()

Return Recall@k and Precision@k.

class RetrieverEvaluator[source]¶

Bases: BaseEvaluator

Return Recall@k and Precision@k.

Recall@k = Number of relevant retrieved documents/ Total number of relevant documents (len(gt_contexts)) Precision@k = Number of relevant retrieved documents/ Total number of retrieved documents (len(retrieved_contexts))

In our implementation, we use exact string matching between each gt context and the joined retrieved context string. You can use the longest common subsequence (LCS) or other similarity metrics(or embedding based) to decide if it is a match or not.

You can also pass ids of retrieved and the reference.

If you do not even have the ground truth context, but only grounth truth answers, you can consider using RAGAS framework for now. It computes the recall as:

Recall = [GT statements that can be attributed to the retrieved context] / [GT statements]

Examples

>>> all_retrieved_context = [
["Apple is founded before Google.",
"Feburary has 28 days in common years. Feburary has 29 days in leap years. Feburary is the second month of the year.",
]
>>> all_gt_context = [
    [
        "Apple is founded in 1976.",
        "Google is founded in 1998.",
        "Apple is founded before Google.",
    ],
    ["Feburary has 28 days in common years", "Feburary has 29 days in leap years"],
]
>>> retriever_recall = RetrieverRecall()
>>> avg_recall, recall_list = retriever_recall.compute(all_retrieved_context, all_gt_context)
>>> avg_recall
2 / 3
>>> recall_list
[1 / 3, 1.0]

References

RAGAS: https://docs.ragas.io/en/stable/concepts/metrics/context_recall.html

compute_single_item(retrieved_context: List[str], gt_context: List[str]) → Dict[str, float][source]¶

Compute the recall of the retrieved context for a single query.

Parameters:

retrieved_context (List[str]) – List of retrieved context strings.
gt_context (List[str]) – List of ground truth context strings.

Returns:

Recall value.

Return type:

float

compute(retrieved_contexts: List[List[str]], gt_contexts: List[List[str]]) → EvaluationResult[source]¶

Compute the recall of the retrieved context for a list of queries. :param retrieved_context: List of retrieved context strings. :param gt_contexts: List of ground truth context strings. :type gt_contexts: List[List[str]]

Returns:

float: Average recall value.
List[float]: Recall values for each query.

Return type:

tuple