retriever_recall#

Retriever Recall @k metric.

Classes

RetrieverRecall()

Recall@k measures the ratio of the number of relevant context strings in the top-k retrieved context to the total number of ground truth relevant context strings.

class RetrieverRecall[source]#

Bases: BaseEvaluator

Recall@k measures the ratio of the number of relevant context strings in the top-k retrieved context to the total number of ground truth relevant context strings.

In our implementation, we use exact string matching between each gt context and the joined retrieved context string. You can use the longest common subsequence (LCS) or other similarity metrics(or embedding based) to decide if it is a match or not.

If you do not even have the ground truth context, but only grounth truth answers, you can consider using RAGAS framework for now. It computes the recall as:

Recall = [GT statements that can be attributed to the retrieved context] / [GT statements]

Examples

>>> all_retrieved_context = [
["Apple is founded before Google.",
"Feburary has 28 days in common years. Feburary has 29 days in leap years. Feburary is the second month of the year.",
]
>>> all_gt_context = [
    [
        "Apple is founded in 1976.",
        "Google is founded in 1998.",
        "Apple is founded before Google.",
    ],
    ["Feburary has 28 days in common years", "Feburary has 29 days in leap years"],
]
>>> retriever_recall = RetrieverRecall()
>>> avg_recall, recall_list = retriever_recall.compute(all_retrieved_context, all_gt_context)
>>> avg_recall
2 / 3
>>> recall_list
[1 / 3, 1.0]

References

compute(retrieved_contexts: List[str] | List[List[str]], gt_contexts: List[List[str]]) EvaluationResult[source]#

Compute the recall of the retrieved context for a list of queries. :param retrieved_contexts: List of retrieved context strings. Using List[str] we assume you have joined all the context sentences into one string. :type retrieved_contexts: Union[List[str], List[List[str]] :param gt_contexts: List of ground truth context strings. :type gt_contexts: List[List[str]]

Returns:

  • float: Average recall value.

  • List[float]: Recall values for each query.

Return type:

tuple