Contributing to the metrics module¶
This page describes the internal metrics architecture and how to extend it with new metric computers.
Overview¶
The metrics module evaluates model predictions against ground-truth targets. Metric computers wrap underlying libraries (torchmetrics, faster-coco-eval) behind a unified interface driven by Hydra _target_ instantiation.
All metric computers implement BaseMetricComputer, which defines:
reset() -> Noneupdate(predictions, targets) -> Nonecompute() -> MetricResult
The MetricResult dataclass contains:
metrics: dict[str, float]— scalar metrics written tometrics.jsonartifacts: dict[str, Any]— structured outputs written toartifacts.json
Important files¶
The factory.py module provides the evaluate() entry point, which uses Hydra’s instantiate() to build metric computers from _target_ keys. Bare class names are automatically resolved to raitap.metrics.* paths.
Runtime flow¶
Metrics run after the forward pass in src/raitap/run/pipeline.py. RAITAP instantiates the configured metric computer, calls update(predictions, targets), then calls compute() to generate the final results.
The metrics module writes three files under the Hydra run folder:
metrics/
├── metrics.json # scalar results
├── artifacts.json # structured outputs (e.g., per-class values)
└── metadata.json # config and experiment metadata
Adding a new metric computer¶
To add a new metric type:
Implement the metric computer
Create a new metric computer class that extends
BaseMetricComputerinsrc/raitap/metrics/:# src/raitap/metrics/new_metrics.py from .base_metric import BaseMetricComputer, MetricResult import torch class NewMetrics(BaseMetricComputer): def __init__(self, **config_kwargs): # Store configuration pass def reset(self) -> None: """Required: Reset internal state.""" pass def update(self, predictions: torch.Tensor, targets: torch.Tensor) -> None: """Required: Update internal state with a batch of predictions and targets.""" predictions, targets = self._prepare_inputs(predictions, targets) # Your update logic here pass def compute(self) -> MetricResult: """Required: Compute final metrics and return MetricResult.""" return MetricResult( metrics={"metric_name": 0.0}, # Scalar metrics artifacts={} # Structured outputs )
Reference
src/raitap/metrics/classification_metrics.pyordetection_metrics.pyfor complete examples.Export from
__init__.pyExport the class from the metrics package:
# src/raitap/metrics/__init__.py from .new_metrics import NewMetrics __all__ = [..., "NewMetrics"]
Create a config preset
Add a config file under
src/raitap/configs/metrics/:# src/raitap/configs/metrics/new_metrics.yaml _target_: NewMetrics # Add configuration parameters here
Use it
uv run raitap metrics=new_metrics uv run raitap metrics=new_metrics metrics.some_param=value
Add tests
Create unit tests in
src/raitap/metrics/tests/following the patterns intest_classification_metrics.pyortest_detection_metrics.py.Update documentation
Add the new metric type to
docs/modules/metrics/frameworks-and-libraries.mdwith usage examples and parameter descriptions.
Extension points¶
Metric computers can wrap any evaluation library. The base class handles device management through _prepare_inputs(), which moves predictions and targets to the appropriate device before calling update().
For metrics that maintain internal state (like torchmetrics classes), store them as instance attributes and update them in update(). Call their compute() method in your compute() implementation and map results to the MetricResult structure.