Skip to Content
doctor_frankentuiReplay triage

Replay triage

A failed doctor_frankentui run leaves behind a full JSONL ledger plus per-case captures. For humans that is too much. The replay triage script walks the ledger, extracts the top-N signals — structured failure pointers with expected/actual evidence — and emits either a compact summary or a JSON report you can paste into a PR or an issue.

Source: scripts/doctor_frankentui_replay_triage.py.

Running it

python3 ./scripts/doctor_frankentui_replay_triage.py \ --run-root /tmp/doctor_frankentui/e2e/failure_20260423T000000Z \ --max-signals 8

Full flags:

FlagDefaultPurpose
--run-root PATHrequiredPath to a doctor_frankentui run root containing meta/.
--output-json PATHemptyWrite the structured report to PATH instead of stdout.
--max-signals N5Cap the number of signals in the compact output.
--max-timeline N40Cap the number of timeline entries.

The signal model

Every signal is one Signal dataclass instance:

@dataclass class Signal: severity: int # 0 = info, 1 = warn, 2 = error message: str # human-readable summary event_index: int | None # line index into meta/events.jsonl event_type: str | None # e.g. "frame_captured", "diff_decision" case_id: str | None # failing case (from meta/case_results.json) step_id: str | None # failing step (from meta/step_results.tsv) pointers: list[str] # JSON pointers to evidence expected: dict[str, Any] actual: dict[str, Any]

Severity levels

  • 0 (info) — contextual observation. Not a failure by itself but narrows the search.
  • 1 (warn) — non-fatal divergence (e.g. a degradation_event followed by a clean recovery).
  • 2 (error) — a hard failure. At least one signal at this level is present whenever the underlying run failed.

Pointer syntax

Pointers are absolute relative to the run root and use a simple path#anchor syntax:

FormMeaning
meta/events.jsonl#L<N>Line N in events.jsonl.
cases/<case_id>/<file>File under the case directory.
meta/<file>#/<path>JSON Pointer into a structured file.

What signals get extracted

The script scans meta/events.jsonl and classifies events in priority order.

Frame checksum mismatches

{ "event_type": "frame_checksum_mismatch", "frame_idx": 42, "expected": "0xdeadbeef00000000", "actual": "0xcafef00d00000000" }

Severity 2. The first occurrence is usually the smoking gun for a render regression.

Contract violations

Events with event_type in {"contract_violation", "invariant_break", "validator_failure"}. Severity 2. The expected/actual fields contain the specific invariant that broke.

Degradation cascade

Events with event_type == "degradation_event" whose to_tier indicates a downgrade (Full → SimpleBorders / NoColors / TextOnly — see frame budget). Severity 1.

Guard breaches

Events with event_type == "conformal_frame_guard" whose verdict is "breach". Severity 1 (repeated breaches cluster into a single signal with a count).

Resize storms

Events with event_type == "resize_decision" whose coalesce outcome indicates unhandled chaos. Severity 1.

Case-level failure

The script reads meta/case_results.json; every case with status: "failed" becomes a severity-2 signal, annotated with the case’s artifact_dir and the events_pointer into the JSONL.

Output shapes

Compact (default)

Replay Triage: doctor_failure_seed0 Run root: /tmp/doctor_frankentui/e2e/failure_20260423T000000Z Signals (top 8) --------------- [ERROR] frame checksum diverged at index 42 event_type=frame_captured case=case_003 step=resize_storm expected={"checksum":"0xdeadbeef00000000"} actual ={"checksum":"0xcafef00d00000000"} pointers: meta/events.jsonl#L4812 cases/case_003/frame_42.txt [WARN] degradation downgrade Full -> NoColors (budget p99 breach) event_type=degradation_event case=case_003 step=resize_storm expected={"tier":"Full"} actual ={"tier":"NoColors","reason":"conformal_frame_guard_breach"} pointers: meta/events.jsonl#L4740 Timeline (last 40 events) -------------------------

Structured (--output-json)

The JSON report mirrors the compact output but is trivially machine-readable. It is the same shape used for meta/replay_triage_report.json in a failure-path run — see artifacts.

python3 ./scripts/doctor_frankentui_replay_triage.py \ --run-root /tmp/doctor_frankentui/e2e/failure_20260423T000000Z \ --output-json /tmp/triage.json \ --max-signals 8 jq '.signals[0]' /tmp/triage.json

Typical triage loop

Run the failure flow

./scripts/doctor_frankentui_failure_e2e.sh

Artifacts land under /tmp/doctor_frankentui/e2e/failure_<ts>/.

Run triage

python3 ./scripts/doctor_frankentui_replay_triage.py \ --run-root /tmp/doctor_frankentui/e2e/failure_<ts> \ --max-signals 8

Follow the pointers

For each severity-2 signal, open its first pointers[] entry. That’s almost always the event where reality diverged from expectation.

# JSON pointer form: meta/events.jsonl#L4812 sed -n '4805,4820p' meta/events.jsonl | jq .

Grep the broader context

Cross-reference with the grep recipes in evidence grep patterns — the event right before the failure is usually the cause.

Reproduce with a shadow-run

Once you have a hypothesis, reproduce it as a shadow-run scenario: two lanes (before / after your fix), same seed, same events, frame checksums must match. If they match, you have a regression test.

Pitfalls

Triage is not a fix. Signals point at where reality diverged from expectation — that’s the symptom, not always the root cause. The root cause is often several events earlier; scan the timeline backwards from the first severity-2 signal.

Missing events.jsonl. If the child process crashed before emitting any event, triage has nothing to say. In that case start from logs/<step>.stderr.log and meta/command_manifest.txt — see artifacts troubleshooting.

--max-signals caps the compact output, not the JSON report. The JSON report always contains every signal the script found. Use it for archival; use the compact form for human scanning.