Multimodal Checks¶

Multimodal checks adapt image, audio, and video evidence into the shared GuardDecision and SafetyEvent contracts. The adapter is explicitly opt-in: a modality must be enabled before it can be checked, and a modality must be marked benchmarked before a supported result may become allow.

Decision Boundary¶

MultimodalVerifierAdapter enforces production-safe defaults:

disabled or unsupported modalities raise errors instead of silently passing
uncertain evidence maps to warn, never allow
hallucinated or temporally inconsistent evidence maps to halt
unbenchmarked modalities map to warn even when the low-level checker says the claim is consistent
optional caption and metadata grounding can reduce a modality score before a decision is emitted
audit payloads and safety events include media references, not raw media, transcripts, frame data, captions, metadata values, or claim text

from director_ai.core.guard_control import RiskEnvelope
from director_ai.core.multimodal_guard import (
    MultimodalCheckRequest,
    MultimodalVerifierAdapter,
)

adapter = MultimodalVerifierAdapter(
    image_guard=image_guard,
    caption_score_fn=caption_grounder,
    metadata_score_fn=metadata_grounder,
    enabled_modalities=("image",),
    benchmarked_modalities=("image",),
)

result = adapter.check(
    MultimodalCheckRequest(
        modality="image",
        claim_text="The image shows a labelled package.",
        media_ref="media://image-42",
        image_bytes=image_bytes,
        caption_text="Package label is absent.",
        metadata={"captured_at": "2026-05-13", "source": "inspection-rig"},
    ),
    risk_envelope=RiskEnvelope(
        action_category="multimodal",
        reversibility="reversible",
        domain="regulated",
        calibrated_threshold=0.5,
        no_go_threshold=0.85,
    ),
    policy_id="policy.multimodal.regulated",
)

Grounding callbacks receive either (caption_text, claim_text) or (metadata, claim_text) and must return a finite score in [0, 1]. Scores below the grounding floor halt the claim; scores below the grounding allow threshold produce a warning unless the base verifier already found a stricter verdict. Evidence references use suffixes such as #caption and #metadata:captured_at, so downstream audit logs can identify which grounding channel was used without storing private captions or metadata values.

Full API¶

director_ai.core.multimodal_guard.adapter.MultimodalCheckRequest `dataclass` ¶

MultimodalCheckRequest(modality: Modality, claim_text: str, media_ref: str, image_bytes: bytes = b'', transcript_text: str = '', frame_similarities: Sequence[float] = (), caption_text: str = '', metadata: Mapping[str, str] = dict())

Input envelope for opt-in multimodal verification.

director_ai.core.multimodal_guard.adapter.MultimodalCheckResult `dataclass` ¶

MultimodalCheckResult(request: MultimodalCheckRequest, signal: VerifierSignal, guard_decision: GuardDecision)

Tenant-safe multimodal verification result.

to_dict ¶

to_dict() -> dict[str, Any]

Serialise without raw media, transcript, or claim text.

to_safety_event ¶

to_safety_event(*, hook_id: str, hook_scope: str = 'agent', request_id: str = '', tenant_id: str = '', latency_ms: float | None = None) -> SafetyEvent

Convert the decision into the shared tenant-safe event schema.

director_ai.core.multimodal_guard.adapter.MultimodalVerifierAdapter ¶

MultimodalVerifierAdapter(*, image_guard: MultimodalGuard | Any | None = None, audio_score_fn: Callable[[str, str], float] | None = None, caption_score_fn: Callable[[str, str], float] | None = None, metadata_score_fn: Callable[[Mapping[str, str], str], float] | None = None, enabled_modalities: Sequence[str] = (), benchmarked_modalities: Sequence[str] = (), temporal_alpha: float = 0.5, temporal_floor: float = 0.2, grounding_floor: float = 0.4, grounding_allow_threshold: float = 0.75)

Opt-in adapter from modality-specific checks to guard decisions.

check ¶

check(request: MultimodalCheckRequest, *, risk_envelope: RiskEnvelope, policy_id: str) -> MultimodalCheckResult

Run the modality check and return a shared guard decision.