base
Abstract base class for evaluation metrics.
Classes
-
class EvaluationResult(avg_score: float, per_item_scores: List[float] | None = None, additional_info: dict | None = None)[source]
Bases: object
Evaluation result.
-
avg_score: float
-
per_item_scores: List[float] | None = None
-
additional_info: dict | None = None
-
class BaseEvaluator(*args, **kwargs)[source]
Bases: object
-
compute_single_item(*args, **kwargs) → float[source]
Compute the score for a single item.
-
compute(*args, **kwargs) → Any[source]
Evaluate a list of predictions and ground truth values. and return overall score and per-item scores.