Skip to content

Supply-Chain ML-BOM

A supply-chain attack swaps a model, dataset, or dependency for a poisoned one (a backdoored checkpoint on a model hub, a tampered training set). The defence is provenance: record every component's SHA-256 digest at a known-good point, then re-verify the deployed artefacts to detect substitution (OWASP ASVS / ML supply-chain controls).

Quick start

from director_ai import ProductionGuard
from director_ai.core.config import DirectorConfig
from director_ai.core.ml_bom import ComponentType

bom = ProductionGuard(DirectorConfig()).ml_bom

# Pin each component to the digest of its known-good bytes.
bom.add_artifact("factcg-onnx", "1.0", ComponentType.MODEL, model_bytes,
                 supplier="anulum", source="hf://anulum/factcg")
bom.add_artifact("aggrefact", "2024", ComponentType.DATASET, dataset_bytes)

print(bom.bom_digest)        # 64-hex fingerprint of the whole inventory

# Later, re-verify what is actually deployed.
report = bom.verify({"factcg-onnx": deployed_model_bytes})
print(report.ok)             # False if any supplied artefact was substituted
print(report.to_dict())      # {"ok", "intact", "tampered", "unverified"}

Components

A MLBOMComponent records name, version, component_type, sha256, and optional supplier / source / license. The type is one of:

ComponentType Tracks
MODEL A weight artefact (checkpoint, ONNX, safetensors).
DATASET A training/evaluation dataset.
DEPENDENCY A software package.
CODE A first-party source artefact or build output.

add_artifact(name, version, type, data, **metadata) digests data to pin the SHA-256 for you; add(component) records a pre-built component. A duplicate name is rejected. MLBOMComponent.matches(data) re-derives the digest of data and compares — a mismatch is substitution/poisoning.

Tamper-evidence and verification

bom_digest is a SHA-256 over the canonical component list: any change to a component (or the set of components) changes it, so a trusted copy of the digest makes the whole inventory tamper-evident.

verify(actuals) maps component name → deployed bytes and classifies each:

Bucket Meaning
intact Supplied bytes match the recorded digest.
tampered Supplied bytes differ (poisoned), or an unknown name not in the inventory.
unverified A recorded component for which no bytes were supplied.

report.ok is True only when nothing is tampered (an unverified component is not a failure — it simply was not checked this pass). Both to_dict() serialisations are tenant-safe: names, versions, digests, and suppliers only, never the artefact bytes.