Contributing to the data module¶
This page describes how to add new built-in dataset configs to RAITAP.
Overview¶
The data module provides built-in sample datasets that can be referenced by name in data.source. These configs live in src/raitap/configs/data/ and define metadata and default loading behavior for common datasets.
Adding a built-in dataset¶
To add a new built-in dataset config:
Create the config file
Add a YAML config under
src/raitap/configs/data/:# src/raitap/configs/data/cifar10.yaml name: cifar10 description: CIFAR-10 — 60k 32×32 colour images in 10 classes source: cifar10
Add loading support (optional)
If the dataset requires custom loading logic beyond what
load_tensor_from_source()provides, add it tosrc/raitap/data/data.pyorsamples.py.Use it
uv run raitap data=cifar10
Update documentation
Add the new dataset name to the list in
docs/modules/data/own-vs-built-in.md.