Supported libraries

constructor, call, and raitap keys

Assessors support three config buckets:

  • constructor: kwargs for the assessor constructor or underlying library object

  • call: verbatim library kwargs for the per-call attack invocation

  • raitap: RAITAP-owned runtime options such as batching, progress bars, and sample-name metadata

Visualisers continue to support constructor and call only.

This keeps the boundary clear for users: call is what the library sees at attack time, while raitap is what RAITAP itself consumes.

The perturbation budget keys (eps, alpha, steps) live in only one of constructor: and call: per framework; the other source is ignored by the adapter. RAITAP picks the authoritative side automatically:

Adapter

Budget block

Why

TorchattacksAssessor

constructor:

The adapter does attack_class(model, **constructor) once and never forwards per-call budget kwargs.

FoolboxAssessor

call:

Foolbox attacks read epsilons=... at attack(fmodel, inputs, targets, epsilons=...).

Putting budget kwargs in the wrong block emits a warning so the misconfiguration is visible in the run log.

Typed semantics and visualiser compatibility

RAITAP uses typed AssessmentKind, ThreatModel, Objective, and PerturbationBudget semantics to validate visualisers against the result they receive. In short:

  • assessors produce typed RobustnessResult.semantics

  • visualisers declare which AssessmentKind they can render via the supported_assessment_kinds: ClassVar[frozenset[AssessmentKind]] attribute

  • the factory rejects incompatible pairings at YAML parse time (AssessmentKindVisualiserIncompatibilityError)

  • image visualisers additionally refuse non-image results (input_spec.kind != IMAGE)

Visualiser

Supports

Notes

ImagePairVisualiser

EMPIRICAL_ATTACK

Renders N rows by 3 columns: clean, perturbed, signed perturbation heatmap. Rejects tabular / time-series / token results.

PerturbationHeatmapVisualiser

EMPIRICAL_ATTACK

Per-sample diverging heatmap of the perturbation. Default channel reduction is signed_dominant (preserves sign without cancelling opposing channels). Other modes: mean, mean_abs, max_abs.

VerdictSummaryVisualiser

FORMAL_VERIFICATION

Two-panel summary: verdict-count bar chart plus a runtime histogram per verified sample.

OutputBoundsCohortVisualiser

FORMAL_VERIFICATION

Boxplot of certified per-class output-bound widths (upper - lower) across the verified batch. Constructor kwargs: whis, show_outliers. Renders a placeholder figure when result.output_bounds is None or all rows are NaN.

OutputBoundsPinnedVisualiser

FORMAL_VERIFICATION

Per-sample plot of [lower_k, upper_k] certified intervals for each output class with the target class highlighted. Constructor kwargs: max_samples, max_classes (default 20 — above it, shows the target plus the classes with the largest certified upper bounds so many-class models like ImageNet stay legible), target_color, bar_color, sample_indices. Falls back to a placeholder when bounds are absent.

OutputBoundsWidthHeatmapVisualiser

FORMAL_VERIFICATION

Heatmap of certified per-class output-bound widths (upper - lower) across the verified batch (rows = samples, columns = classes). Constructor kwargs: cmap, max_samples, figsize. Renders a placeholder figure when result.output_bounds is None or every row is NaN.

OutputBoundsMarginHeatmapVisualiser

FORMAL_VERIFICATION

Heatmap of signed per-class margins relative to the target class's lower bound (rows = samples, columns = classes; target cell masked). Constructor kwargs: cmap, max_samples, figsize. Falls back to a placeholder when bounds or targets are absent.

CorruptionAccuracyVisualiser

STATISTICAL_SAMPLING

Clean vs corrupted accuracy bars with a CI whisker. Annotated with corruption name, severity, and N.

Empirical image visualisers declare whether they embed a clean-input panel or a perturbation-map panel by default. Compact reporting uses those declarations to choose one canonical owner per facet and ask non-owners to omit repeated panels. The runtime kwargs are include_clean_input and include_perturbation_map. They affect report-only renders; persisted visualiser PNGs remain self-contained. Verifier visualisers keep the default facet flags (False) and do not need to accept these kwargs unless they explicitly opt into the contract.

Contributor-facing details about the assessor / visualiser internals are in Contributing to the robustness module.

Assessor libraries

Torchattacks

TorchattacksAssessor wraps every attack class in torchattacks via dynamic loading; the YAML algorithm: field names the class. White-box attacks require a torch backend with autograd (the adapter rejects ONNX backends with AssessorBackendIncompatibilityError). Inputs are made contiguous before the call so attacks that internally view(...) (e.g. PGDL2, CW, DeepFool, Square) work on RAITAP's loader output (which produces non-contiguous NCHW tensors via HWC→CHW transpose).

The adapter registers 36 attacks, covering all dispatchable torchattacks classes. Excluded:

  • VANILA: no-op (returns input unchanged; attack success rate is always 0).

  • LGV: needs a training dataloader and training epochs (tracked: #276).

  • MultiAttack: combinator over a list of sub-attacks; needs nested config (tracked: #279).

JSMA caveat: hardcodes target=(labels+1)%10 for untargeted mode and is only valid on 10-class models. RAITAP raises a clear error if it detects the model has a different class count, and warns if the count cannot be determined at runtime.

APGDT and FAB default to n_classes=10 for their targeted search. On a model with a different class count, set constructor.n_classes to the real number, or the targeting is silently wrong.

A representative sample:

Algorithm

Threat model

Norm

Notes

FGSM

white-box

L∞

Single-step gradient sign. CPU-friendly.

BIM

white-box

L∞

Iterative FGSM.

PGD

white-box

L∞

Projected gradient descent.

PGDL2

white-box

L2

L2 variant of PGD.

CW

white-box

L2

Carlini-Wagner optimisation attack.

DeepFool

white-box

L2

Iterative linearisation.

MIFGSM

white-box

L∞

Momentum-iterative FGSM.

AutoAttack

white-box

L∞

Ensemble of attacks; expensive.

Square

black-box (score)

L∞

Score-based query attack.

OnePixel

black-box (score)

L0

Differential-evolution single-pixel attack.

JSMA

white-box

L0

10-class models only (see caveat above).

Foolbox

FoolboxAssessor wraps foolbox.attacks.<algorithm> against a foolbox.PyTorchModel(model, bounds=..., preprocessing=...). Bounds default to (0.0, 1.0), matching RAITAP's loader. The adapter accepts only scalar eps / epsilons; multi-epsilon sweeps would change the result tensor shape across configurations and break the uniform RobustnessResult contract — they are intentionally out of scope for the current adapter.

The adapter registers ~55 attacks, covering all dispatchable foolbox classes. Alias names (e.g. PGD, FGSM) are deduped to their canonical norm-prefixed class (e.g. LinfPGD, LinfFastGradientAttack). Notable caveats:

  • FlexibleDistance attacks (GaussianBlurAttack, InversionAttack, BinarySearchContrastReductionAttack, LinearSearchContrastReductionAttack, LinearSearchBlendedUniformNoiseAttack) require an explicit norm: set constructor.distance (e.g. distance: l2). RAITAP raises an actionable error if it is missing.

  • VirtualAdversarialAttack requires constructor.steps (number of power iterations).

  • Brendel-Bethge family (L0BrendelBethgeAttack, L1BrendelBethgeAttack, L2BrendelBethgeAttack, LinfinityBrendelBethgeAttack) needs numba, now included in the foolbox extra.

  • DatasetAttack is supported: RAITAP feeds the input batch as its reference pool automatically. The attack pastes in samples from that pool, so it is most meaningful when the assessed batch is large and diverse (a tiny batch gives it little to draw from).

Excluded:

  • BinarizationRefinementAttack: needs starting points / attack chaining (tracked: #277).

  • SpatialAttack: rotation/translation, not norm-bounded; needs a non-norm budget surface (tracked: #278).

  • PointwiseAttack: starting-point attack (tracked: #280).

Algorithm

Threat model

Norm

Notes

LinfPGD

white-box

L∞

L2PGD

white-box

L2

LinfFastGradientAttack

white-box

L∞

L2FastGradientAttack

white-box

L2

L2CarliniWagnerAttack

white-box

L2

L2DeepFoolAttack

white-box

L2

BoundaryAttack

black-box (decision)

L2

DatasetAttack

black-box (decision)

L2

Input batch fed as pool automatically.

Marabou

MarabouAssessor wraps maraboupy>=2.0 to provide SAT/UNSAT-based formal verification for L∞ box perturbations over static-shape ONNX MLPs. Verdicts land in RobustnessResult.verdicts (VERIFIED / FALSIFIED / UNKNOWN / ERROR) and counter-examples in perturbed_inputs.

Marabou reads and reasons over the bare ONNX graph and bypasses every Python preprocessing module — data.preprocessing and data.model_input_transformation are skipped regardless of origin (custom-file modules that the ONNX tensor backend would normally apply are not invoked by Marabou). model-bundled preprocessing is not available for ONNX models at all. Preprocess inputs before export or encode the preprocessing directly in the ONNX graph if the formal property must include it.

Algorithms

algorithm

Property

linf-box

Per-input box [x_i - eps, x_i + eps] plus an output disjunction asserting "any non-target class dominates the target". UNSAT → VERIFIED, SAT → FALSIFIED with reconstructed counter-example.

Per-logit output bounds (opt-in)

MarabouAssessor can additionally certify per-class logit ranges for each VERIFIED sample, populating RobustnessResult.output_bounds.

Kwarg

Default

Meaning

compute_output_bounds

False

Enable bisection-via-SAT bound extraction after each VERIFIED verdict.

bound_search_range

1e3

Initial probe window [-range, +range] per output variable.

bound_tolerance

1e-2

Stop bisection when the certified interval narrows below this.

Marabou exposes no native min/max objective, so bounds are extracted by binary search on setUpperBound / setLowerBound of each output variable. Per verified sample, the assessor runs up to 2 × K × (⌈log₂(2 × bound_search_range / bound_tolerance)⌉ + 2) extra Marabou solves — for K=10 classes with defaults that is up to ~400 extra solves per sample. FALSIFIED / UNKNOWN / ERROR samples are skipped (their rows in the stacked bounds tensor are NaN-padded). Inconclusive verdicts during bisection (TIMEOUT / UNKNOWN) break the search conservatively; the returned bound is the loosest still-certified value, never a falsely tight one. If every probe for a given class/mode is inconclusive the assessor emits a WARNING log so vacuous bounds are obvious.

auto-LiRPA

AutoLiRPAAssessor (registry auto_lirpa) wraps auto-LiRPA — a sound but incomplete verifier that propagates certified per-class logit bounds (CROWN / IBP) directly over a torch model. Unlike Marabou it needs no ONNX export and scales to CNNs and L2 / L∞ budgets. Torch backend only (needs autograd + the live nn.Module); ONNX backends are rejected.

Verdicts are VERIFIED (lb[true] > max(ub[other classes])) or UNKNOWNnever FALSIFIED (sound + incomplete, so no counter-example). Certified lower_bounds / upper_bounds populate RobustnessResult.output_bounds for both VERIFIED and UNKNOWN samples (the bounds are the certificate), so all FORMAL_VERIFICATION visualisers above apply.

algorithm

Method

Norm

crown (default)

CROWN (backward)

L∞

ibp

IBP (interval)

L∞

crown-ibp

CROWN-IBP (hybrid)

L∞

crown-l2

CROWN (backward)

L2

constructor.epsilon sets the default budget radius (overridden per-call by eps). Install: uv sync --extra auto-lirpa (git-only; resolved from GitHub master — see below). It is not part of the robustness umbrella.

Note

auto-LiRPA has no upstream Intel XPU support. The adapter runs on the active backend's device but emits a warning on an Intel XPU backend (less-common ops may hit XPU gaps); fall back to a CPU backend if you hit operator not implemented for XPU. auto-LiRPA also has no PyPI release supporting torch 2.x, so it installs from GitHub master and pins the project to the torch 2.8 window — see Contributing to the robustness module.

ImageCorruptions

ImageCorruptionsAssessor (registry imagecorruptions) wraps the imagecorruptions library to apply one of 19 corruptions at a chosen severity. It estimates average-case accuracy under the corruption distribution, not a per-input adversarial verdict. threat_model is NOT_APPLICABLE (no adversary).

Install: uv add "raitap[imagecorruptions]" (or "raitap[robustness]" to include all robustness libraries).

Config key

Values

algorithm

One of the 19 corruptions below

constructor.severity

Integer 1..5

raitap.ci_method

wilson (default) or clopper_pearson

raitap.ci_level

float, default 0.95

The 19 supported algorithm values (grouped by family):

  • Noise: gaussian_noise, shot_noise, impulse_noise

  • Blur: defocus_blur, glass_blur, motion_blur, zoom_blur

  • Weather: snow, frost, fog, brightness

  • Digital: contrast, elastic_transform, pixelate, jpeg_compression

  • Holdout (ImageNet-C extended set): speckle_noise, gaussian_blur, spatter, saturate

Output is corrupted_accuracy plus a binomial CI (accuracy_ci_low, accuracy_ci_high, n_samples, n_correct) in RobustnessMetrics. Per-sample verdicts are CORRECT_UNDER_PERTURBATION / MISCLASSIFIED_UNDER_PERTURBATION.

Third-party adapters

Third-party adapters published to PyPI can register under the raitap.adapters entry-point group and are auto-discovered at config-registration time. Once installed they appear alongside in-tree assessors: +robustness=myattack in the CLI or from raitap.robustness import myattack in Python. See Writing a plugin.