Skip to content

Continuous Guard Fuzzing

A static adversarial suite checks a fixed list of attacks; a fuzzer mutates a seed corpus round after round and surfaces the variant a guard fails to flag. The ContinuousFuzzer takes a guard predicate and a corpus of strings the guard should flag, applies obfuscating mutations, and reports every one that slipped through — plus any seed the guard missed outright. The RNG is seeded, so each finding is replayable as a regression case.

Quick start

from director_ai import ProductionGuard
from director_ai.core.config import DirectorConfig

fuzzer = ProductionGuard(DirectorConfig()).continuous_fuzzer(seed=0)

# predicate: True means "the guard flags this as an attack".
def guard_flags(text: str) -> bool:
    return my_injection_detector.is_attack(text)

report = fuzzer.run(guard_flags, rounds_per_seed=50)   # default attack corpus
print(report.ok)                 # False if any mutation bypassed the guard
for bypass in report.bypasses:
    print(bypass.operator, "→", bypass.mutation)        # replayable obfuscation

Mutation operators

Each operator perturbs an attack string while a human still reads the same intent:

Operator Obfuscation
case_flip Randomised upper/lower casing.
whitespace_inject Stray spaces, tabs, newlines between characters.
zero_width_inject Zero-width spaces/joiners/BOM sprinkled in.
homoglyph_substitute Latin letters → confusable Cyrillic homoglyphs.
leetspeak a→4, e→3, i→1, o→0, s→5, t→7.
char_duplicate Doubles a sample of characters.
delimiter_inject Splices chat/template delimiters (</s>, [INST], …).

Pass a custom mutators mapping to ContinuousFuzzer to add domain-specific operators.

The report

run(predicate, corpus=None, rounds_per_seed=25) returns a FuzzReport:

Field Meaning
seeds_tested Number of corpus seeds.
mutations_run Total mutations evaluated.
bypasses Bypass(operator, seed, mutation) for each mutation the guard missed.
seed_misses Seeds the guard failed to flag even unmutated (a baseline gap).
operators_used Which operators were exercised.
ok True only when there were no bypasses and no seed misses.

A seed the guard cannot even flag unmutated is reported as a seed_miss and is not mutated further — fix the baseline first. Wire run() into CI to fail the build when report.ok is False, turning bypass discovery into a standing regression gate. The bypass payloads are attack-corpus mutations, not tenant data, so they are safe to surface to the security team for triage.