Skip to content

Multimodal Checks

Multimodal checks adapt image, audio, and video evidence into the shared GuardDecision and SafetyEvent contracts. The adapter is explicitly opt-in: a modality must be enabled before it can be checked, and a modality must be marked benchmarked before a supported result may become allow.

Decision Boundary

MultimodalVerifierAdapter enforces production-safe defaults:

  • disabled or unsupported modalities raise errors instead of silently passing
  • uncertain evidence maps to warn, never allow
  • hallucinated or temporally inconsistent evidence maps to halt
  • unbenchmarked modalities map to warn even when the low-level checker says the claim is consistent
  • optional caption and metadata grounding can reduce a modality score before a decision is emitted
  • audit payloads and safety events include media references, not raw media, transcripts, frame data, captions, metadata values, or claim text
from director_ai.core.guard_control import RiskEnvelope
from director_ai.core.multimodal_guard import (
    MultimodalCheckRequest,
    MultimodalVerifierAdapter,
)

adapter = MultimodalVerifierAdapter(
    image_guard=image_guard,
    caption_score_fn=caption_grounder,
    metadata_score_fn=metadata_grounder,
    enabled_modalities=("image",),
    benchmarked_modalities=("image",),
)

result = adapter.check(
    MultimodalCheckRequest(
        modality="image",
        claim_text="The image shows a labelled package.",
        media_ref="media://image-42",
        image_bytes=image_bytes,
        caption_text="Package label is absent.",
        metadata={"captured_at": "2026-05-13", "source": "inspection-rig"},
    ),
    risk_envelope=RiskEnvelope(
        action_category="multimodal",
        reversibility="reversible",
        domain="regulated",
        calibrated_threshold=0.5,
        no_go_threshold=0.85,
    ),
    policy_id="policy.multimodal.regulated",
)

Grounding callbacks receive either (caption_text, claim_text) or (metadata, claim_text) and must return a finite score in [0, 1]. Scores below the grounding floor halt the claim; scores below the grounding allow threshold produce a warning unless the base verifier already found a stricter verdict. Evidence references use suffixes such as #caption and #metadata:captured_at, so downstream audit logs can identify which grounding channel was used without storing private captions or metadata values.

Full API

director_ai.core.multimodal_guard.adapter.MultimodalCheckRequest dataclass

MultimodalCheckRequest(modality: Modality, claim_text: str, media_ref: str, image_bytes: bytes = b'', transcript_text: str = '', frame_similarities: Sequence[float] = (), caption_text: str = '', metadata: Mapping[str, str] = dict())

Input envelope for opt-in multimodal verification.

director_ai.core.multimodal_guard.adapter.MultimodalCheckResult dataclass

MultimodalCheckResult(request: MultimodalCheckRequest, signal: VerifierSignal, guard_decision: GuardDecision)

Tenant-safe multimodal verification result.

to_dict

to_dict() -> dict[str, Any]

Serialise without raw media, transcript, or claim text.

to_safety_event

to_safety_event(*, hook_id: str, hook_scope: str = 'agent', request_id: str = '', tenant_id: str = '', latency_ms: float | None = None) -> SafetyEvent

Convert the decision into the shared tenant-safe event schema.

director_ai.core.multimodal_guard.adapter.MultimodalVerifierAdapter

MultimodalVerifierAdapter(*, image_guard: MultimodalGuard | Any | None = None, audio_score_fn: Callable[[str, str], float] | None = None, caption_score_fn: Callable[[str, str], float] | None = None, metadata_score_fn: Callable[[Mapping[str, str], str], float] | None = None, enabled_modalities: Sequence[str] = (), benchmarked_modalities: Sequence[str] = (), temporal_alpha: float = 0.5, temporal_floor: float = 0.2, grounding_floor: float = 0.4, grounding_allow_threshold: float = 0.75)

Opt-in adapter from modality-specific checks to guard decisions.

check

check(request: MultimodalCheckRequest, *, risk_envelope: RiskEnvelope, policy_id: str) -> MultimodalCheckResult

Run the modality check and return a shared guard decision.