Supply-Chain ML-BOM¶

A supply-chain attack swaps a model, dataset, or dependency for a poisoned one (a backdoored checkpoint on a model hub, a tampered training set). The defence is provenance: record every component's SHA-256 digest at a known-good point, then re-verify the deployed artefacts to detect substitution (OWASP ASVS / ML supply-chain controls).

Quick start¶

from director_ai import ProductionGuard
from director_ai.core.config import DirectorConfig
from director_ai.core.ml_bom import ComponentType

bom = ProductionGuard(DirectorConfig()).ml_bom

# Pin each component to the digest of its known-good bytes.
bom.add_artifact("factcg-onnx", "1.0", ComponentType.MODEL, model_bytes,
                 supplier="anulum", source="hf://anulum/factcg")
bom.add_artifact("aggrefact", "2024", ComponentType.DATASET, dataset_bytes)

print(bom.bom_digest)        # 64-hex fingerprint of the whole inventory

# Later, re-verify what is actually deployed.
report = bom.verify({"factcg-onnx": deployed_model_bytes})
print(report.ok)             # False if any supplied artefact was substituted
print(report.to_dict())      # {"ok", "intact", "tampered", "unverified"}

Components¶

A MLBOMComponent records name, version, component_type, sha256, and optional supplier / source / license. The type is one of:

`ComponentType`	Tracks
`MODEL`	A weight artefact (checkpoint, ONNX, safetensors).
`DATASET`	A training/evaluation dataset.
`DEPENDENCY`	A software package.
`CODE`	A first-party source artefact or build output.

add_artifact(name, version, type, data, **metadata) digests data to pin the SHA-256 for you; add(component) records a pre-built component. A duplicate name is rejected. MLBOMComponent.matches(data) re-derives the digest of data and compares — a mismatch is substitution/poisoning.

Tamper-evidence and verification¶

bom_digest is a SHA-256 over the canonical component list: any change to a component (or the set of components) changes it, so a trusted copy of the digest makes the whole inventory tamper-evident.

verify(actuals) maps component name → deployed bytes and classifies each:

Bucket	Meaning
`intact`	Supplied bytes match the recorded digest.
`tampered`	Supplied bytes differ (poisoned), or an unknown name not in the inventory.
`unverified`	A recorded component for which no bytes were supplied.

report.ok is True only when nothing is tampered (an unverified component is not a failure — it simply was not checked this pass). Both to_dict() serialisations are tenant-safe: names, versions, digests, and suppliers only, never the artefact bytes.