Testing overview

FrankenTUI’s verification layer exists to make one promise believable: a frame that rendered correctly yesterday will render correctly today, across every supported terminal profile, and any migration of the runtime underneath has to prove it before it ships.

That promise is expensive. The testing stack earns it with four principles:

No mocks where behaviour matters. Tests exercise real Buffers, real BufferDiffs, real Presenters, and a real ProgramSimulator. Output is captured, not faked.
Snapshots for every demo screen. All 46 screens in ftui-demo-showcase have insta-backed snapshots checked into the repo.
Deterministic fixtures. A small set of environment variables — E2E_DETERMINISTIC, E2E_SEED, E2E_TIME_STEP_MS — makes seeds, timestamps, and run IDs stable from one run to the next. See determinism fixtures.
Shadow-runs for runtime migrations. ShadowRun::compare runs the same model twice through two independent lanes and refuses to merge anything that changes the frame checksum.

This page is the map. Each of the six sub-pages below is the territory for one part of the stack.

Why no mocks

The no-mock policy forbids hiding real behaviour behind trait objects. The rules are simple:

Output capture is allowed — a Vec<u8> that implements Write to record ANSI output is not a mock, it is a sink.
Data builders are allowed — test_panel() constructing a Panel is a factory, not a substitute.
Headless rendering is encouraged — ProgramSimulator, HeadlessTerm, and direct Buffer/Frame use are the first choice.
Mocking terminal I/O, event parsers, or the render pipeline is disallowed. Use PTY tests, real InputParser byte feeds, and real buffers + diffs.

The consequence: every passing test exercises the same code path a user would. A green test cannot hide an incorrect ANSI sequence.

Map of the stack

- src/lib.rs snapshot macros, buffer→text/ANSI
- src/determinism.rs seed + clock + env capture
- src/lab_integration.rs Lab + LabSession + FrameRecord
- src/shadow_run.rs ShadowRun::compare + ShadowVerdict
- src/rollout_scorecard.rs Go / NoGo / Inconclusive
- src/benchmark_gate.rs Threshold + Measurement + GateResult
- src/cli.rs replay / seed-demo / migrate / certify / report / plan
- src/doctor.rs environment certification
- src/capture.rs replay orchestration
- src/suite.rs profile-matrix replay
- e2e_test.sh main PTY suite
- doctor_frankentui_*.sh certification + determinism + coverage
- perf_regression_gate.sh criterion budget check
- bench_budget.sh budget-constrained benchmarks

The four verification layers

Unit + snapshot layer (`ftui-harness` + `ftui-demo-showcase`)

In-process tests call assert_snapshot!(name, &buf) after rendering a widget or screen. Mismatches dump a unified diff and fail the run. Updating is one env var away:


BLESS=1 cargo test -p ftui-demo-showcase
cargo insta review

See snapshot tests.

Headless-runtime layer (`ProgramSimulator`)

Runs a Model with no terminal. Injects events, sends messages, advances ticks, captures frames. Used for runtime-loop correctness tests where pixels are secondary to state machine transitions.

Shadow-run layer (`ShadowRun::compare`)

Runs the same model twice under deterministic seeds through two independent LabSessions. Compares frame checksums. Used to certify that a runtime migration (threading → Asupersync executor, diff strategy swap, etc.) is behaviour-preserving. Cross-links to runtime rollout lanes, shadow-run docs, and the rollout scorecard.

E2E + CI-gate layer (`scripts/*.sh` + `doctor_frankentui`)

Shell scripts drive the full app through a PTY, write JSONL evidence ledgers, and enforce coverage and performance budgets. CI fails on any gate breach. See E2E scripts and benchmark gate.

The determinism contract

A deterministic test run produces:

Stable run_id of the form {prefix}_seed{seed}.
Stable timestamps of the form T{n:06} instead of wall-clock seconds.
A monotonic now_ms() that advances by E2E_TIME_STEP_MS per call.
A stable env_snapshot capturing TERM, COLORTERM, NO_COLOR, TMUX, ZELLIJ, seed, and deterministic.

This is what makes the frame checksums in shadow-runs comparable across two lanes, and what lets the determinism soak test detect non-volatile divergence.

Sub-pages

Program simulator Shadow-run Snapshot tests Determinism fixtures E2E scripts Benchmark gate

doctor_frankentui overview Rollout: shadow-run Rollout: scorecard Contributing: snapshot blessing Evidence sink

Testing overview

Why no mocks

Map of the stack

The four verification layers

Unit + snapshot layer (ftui-harness + ftui-demo-showcase)

Headless-runtime layer (ProgramSimulator)

Shadow-run layer (ShadowRun::compare)

E2E + CI-gate layer (scripts/*.sh + doctor_frankentui)