Skip to content

Multilingual Corpus

Director-AI ships a deterministic multilingual evaluation corpus for EU-market regression testing. It is not a leaderboard claim; it is a fixture for catching locale regressions in factual consistency, numeric consistency, policy compliance, temporal freshness, and retrieval grounding.

Coverage

Property Value
File benchmarks/multilingual_corpus.jsonl
Rows 200
Languages English, German, French, Spanish, Italian, Polish, Czech, Dutch
Rows per language 25
Labels supported, contradicted
Expected decisions allow, halt
Builder tools/build_multilingual_corpus.py
Validator tools/validate_multilingual_corpus.py

Every row contains:

  • id
  • language
  • language_name
  • category
  • prompt
  • source
  • response
  • label
  • expected_decision
  • risk_tags

Validation

uv run --frozen python tools/validate_multilingual_corpus.py .

The validator fails when:

  • the corpus does not contain exactly 200 rows;
  • any of the 8 expected languages has anything other than 25 rows;
  • a required field is missing or empty;
  • row IDs do not match their language prefix;
  • labels and expected decisions disagree;
  • category or label coverage is incomplete;
  • risk tags are absent or malformed.

Regeneration

uv run --frozen python tools/build_multilingual_corpus.py
uv run --frozen python tools/validate_multilingual_corpus.py .

The builder is deterministic. Review generated diffs before publishing because the JSONL file is the benchmark fixture used by downstream regression checks.