Configuration¶
This page describes how to configure the data used to assess the model.
Options¶
Name |
Allowed |
Default |
Description |
|---|---|---|---|
|
|
|
Name shown in outputs and tracking metadata. |
|
|
|
Optional human-readable dataset description. |
|
|
|
Path to a local data directory or file, a URL, or a named sample set such as |
|
|
|
Optional batch size for the initial model forward pass used to produce predictions for metrics, report sample ranking, and |
|
|
|
Data-side preprocessing applied in the loader, before the batch reaches the model. Per-image for image sources; per-batch on the stacked |
|
|
|
Transformation applied at the model boundary, on every forward pass. Typical contents: Normalize. For Torch backends it stays inside autograd so attribution and adversarial budgets see the user-facing input space; for ONNX backends it runs on the tensor call path before the ONNX session. |
|
|
|
Optional path to a labels file (CSV, TSV, or Parquet), URL, or named sample set. When set to a sample name (e.g. |
|
|
|
Optional sample-ID column used to align labels with filenames, for example |
|
|
|
Optional class-label column. If omitted, one-hot numeric columns are reduced with |
|
|
|
Optional label parsing strategy. |
|
|
|
How label-file ids are matched against discovered sample files. |
|
|
|
Input modality + layout hints, normally auto-inferred from |
|
|
|
The non-batch per-sample layout expected by the model (batch dim is implicit and resolved at runtime). Beyond informing output-space inference, |
Examples¶
data:
name: "my-dataset"
description: "Internal validation set"
source: "./data/images"
forward_batch_size: 32
preprocessing: model-bundled
model_input_transformation: model-bundled
labels:
source: "./data/labels.csv"
id_column: "image"
column: "label"
encoding: "index"
id_strategy: "auto"
from raitap.data import (
DataConfig,
IdStrategy,
LabelEncoding,
LabelsConfig,
Preprocessing,
)
data = DataConfig(
name="my-dataset",
description="Internal validation set",
source="./data/images",
forward_batch_size=32,
preprocessing=Preprocessing.model_bundled,
model_input_transformation=Preprocessing.model_bundled,
labels=LabelsConfig(
source="./data/labels.csv",
id_column="image",
column="label",
encoding=LabelEncoding.index,
id_strategy=IdStrategy.auto,
),
)
uv run raitap data.source="./data/images" data.preprocessing=model-bundled data.model_input_transformation=model-bundled data.labels.source="./data/labels.csv" data.labels.column=label
raitap data.source="./data/images" data.preprocessing=model-bundled data.model_input_transformation=model-bundled data.labels.source="./data/labels.csv" data.labels.column=label
For tabular models whose backend expects an unusual per-sample layout (such
as ACAS Xu, a Torch network whose forward takes (N, 1, 1, 5)), supply
input_metadata.shape explicitly so the pipeline reshapes the flat feature
vectors before the forward pass:
data:
input_metadata:
kind: tabular
layout: "(B,F)"
feature_names: [rho, theta, psi, v_own, v_int]
shape: [1, 1, 5] # non-batch dims; reshapes (N, 5) -> (N, 1, 1, 5)
For nested ImageFolder-style layouts (e.g. data/test/<class>/<file>.jpg)
see Using your own data or built-in samples.