Skip to Content
doctor_frankentuiDeterminism soak

Determinism soak

The determinism soak test runs the full happy-path and failure-path flows N times under identical seeds and compares the resulting frame checksums. If any non-volatile frame diverges between iterations, the gate fails — and the divergence is almost always a latent bug in something “deterministic” that turned out not to be.

Source: scripts/doctor_frankentui_determinism_soak.sh.

Why a soak test

Individual tests can pass by luck. A stray HashMap iteration, a wall-clock read, a thread race that usually goes one way but occasionally goes the other — none of these are reliably caught by a single run. A soak test repeats the deterministic flow many times and insists every iteration produces the same ledger. If the system is honest about determinism, this is cheap; if it isn’t, the soak test finds out.

Running the soak

# Default: 3 iterations, timestamped run root ./scripts/doctor_frankentui_determinism_soak.sh # Custom iteration count via env var DOCTOR_FRANKENTUI_SOAK_RUNS=10 ./scripts/doctor_frankentui_determinism_soak.sh # Explicit run root + iteration count as positional args ./scripts/doctor_frankentui_determinism_soak.sh /tmp/custom_root 10

The script exits non-zero on:

  • A required workflow script missing or not executable (doctor_frankentui_happy_e2e.sh, doctor_frankentui_failure_e2e.sh).
  • ITERATIONS not a positive integer.
  • The schema file missing (crates/doctor_frankentui/coverage/e2e_jsonl_schema.json).
  • Any non-volatile frame divergence across iterations.

Prerequisites

  • bash
  • python3 — for cross-iteration JSONL parsing.
  • jq — for JSON field extraction.
  • cargo — the underlying happy/failure scripts build the demo.

Output layout

/tmp/doctor_frankentui/determinism_soak_<TIMESTAMP>/ ├── happy_run_1/ (full happy-path artifacts for iteration 1) │ ├── logs/ │ ├── project/ │ └── meta/ (summary.json, events.jsonl, artifact_manifest.json, …) ├── happy_run_2/ ├── happy_run_3/ ├── failure_run_1/ (full failure-path artifacts for iteration 1) ├── failure_run_2/ ├── failure_run_3/ ├── logs/ │ ├── happy_run_1.stdout.log │ ├── happy_run_1.stderr.log │ └── … └── meta/ ├── run_index.tsv ├── determinism_report.json └── determinism_report.txt

Each happy_run_<i> and failure_run_<i> is a complete, independent run of its flow. The full artifact contract applies inside each.

meta/run_index.tsv

One row per workflow-iteration combination:

workflow iteration run_dir stdout_log stderr_log status duration_ms happy 1 happy_run_1 logs/happy_run_1.stdout.log logs/happy_run_1.stderr.log ok 34121 happy 2 happy_run_2 logs/happy_run_2.stdout.log logs/happy_run_2.stderr.log ok 34008 happy 3 happy_run_3 logs/happy_run_3.stdout.log logs/happy_run_3.stderr.log ok 34194 failure 1 failure_run_1 logs/failure_run_1.stdout.log logs/failure_run_1.stderr.log ok 27810

Gives you a machine-readable index into the per-iteration artifacts.

meta/determinism_report.json

{ "schema_version": "determinism-report-v1", "iterations": 3, "total_frames": 2148, "matches": 2148, "divergences": 0, "divergence_ratio": 0.0, "per_iteration": [ { "iter": 1, "workflow": "happy", "run_id": "doctor_happy_seed0", "frames": 716 }, { "iter": 2, "workflow": "happy", "run_id": "doctor_happy_seed0", "frames": 716 }, { "iter": 3, "workflow": "happy", "run_id": "doctor_happy_seed0", "frames": 716 } ], "volatile_events": [], "diverging_frames": [] }

Key fields:

  • total_frames — cumulative frames captured across all iterations.
  • matches — frames whose checksum equalled the iteration-1 baseline.
  • divergences — frames whose checksum differed.
  • divergence_ratiodivergences / total_frames. Gate passes iff this is zero (or every divergence is on the volatile allowlist).
  • volatile_events — events explicitly marked non-deterministic by schema (e.g. a wall-clock event that the system has acknowledged will never be soak-stable; rare).
  • diverging_frames — detail for each mismatched frame: iteration, frame index, baseline checksum, observed checksum, JSON pointer to the event in that iteration’s events.jsonl.

meta/determinism_report.txt

Human-readable summary. Shape:

Determinism Soak Report ----------------------- Iterations: 3 Total frames: 2148 Matches: 2148 Divergences: 0 Divergence rate: 0.000% Verdict: PASS Per-iteration: happy/1 ok 716 frames run_id=doctor_happy_seed0 happy/2 ok 716 frames run_id=doctor_happy_seed0 happy/3 ok 716 frames run_id=doctor_happy_seed0

Debugging a divergence

Open determinism_report.json and locate the first diverging frame

jq '.diverging_frames[0]' meta/determinism_report.json

Note the iter, frame_idx, baseline_checksum, and observed_checksum.

Open both iterations’ events.jsonl around that frame

sed -n '4800,4820p' happy_run_1/meta/events.jsonl sed -n '4800,4820p' happy_run_2/meta/events.jsonl diff <(sed -n '4800,4820p' happy_run_1/meta/events.jsonl) \ <(sed -n '4800,4820p' happy_run_2/meta/events.jsonl)

The first differing line is almost always the nondeterministic input.

Classify the source

Common culprits:

PatternSmell
HashMap iteration orderSwitch to BTreeMap or explicit sort.
std::time::SystemTime::nowRoute through DeterminismFixture::now_ms.
Thread scheduling in Cmd::TaskEnsure the task is ordered relative to update.
External process outputSeed the subprocess; capture its output deterministically.
Unbounded retry/backoffPin the retry count under E2E_DETERMINISTIC.

Fix, then re-soak

DOCTOR_FRANKENTUI_SOAK_RUNS=10 ./scripts/doctor_frankentui_determinism_soak.sh

A fix is good when 10 iterations produce zero divergences.

CI wiring

CI runs DOCTOR_FRANKENTUI_SOAK_RUNS=3 on every push and a nightly DOCTOR_FRANKENTUI_SOAK_RUNS=20 job. The nightly is where rare races surface; if the nightly goes red, treat it like any other CI failure — don’t wait for the “lucky green” retry.

Relationship to shadow-runs

A shadow-run compares two lanes of the same iteration. A soak compares N iterations of the same lane. They catch different bugs:

  • Shadow-run: catches behaviour-changing migrations (threading vs Asupersync, old diff vs new diff).
  • Soak: catches latent non-determinism inside a single lane.

Both are required for a green release. See shadow-run and rollout scorecard.

Pitfalls

Don’t whitelist a divergence to make the gate pass. Every entry on volatile_events is an admission that part of the system is not deterministic. Fix the source; add an entry only when determinism is genuinely impossible (e.g. a wall-clock-stamped audit event the system does not control).

Iteration count matters. A 3-iteration soak hides races that only manifest 1 in 10 times. Use 10+ when investigating a suspected race, not 3.

Fresh process each iteration. The script spawns a new doctor_frankentui_*_e2e.sh subprocess per iteration on purpose — a leftover static, thread-local, or allocator-state would hide a real bug. Don’t re-plumb the script to reuse state.