Testing overview
FrankenTUI’s verification layer exists to make one promise believable: a frame that rendered correctly yesterday will render correctly today, across every supported terminal profile, and any migration of the runtime underneath has to prove it before it ships.
That promise is expensive. The testing stack earns it with four principles:
- No mocks where behaviour matters. Tests exercise real
Buffers, realBufferDiffs, realPresenters, and a realProgramSimulator. Output is captured, not faked. - Snapshots for every demo screen. All 46 screens in
ftui-demo-showcasehave insta-backed snapshots checked into the repo. - Deterministic fixtures. A small set of environment
variables —
E2E_DETERMINISTIC,E2E_SEED,E2E_TIME_STEP_MS— makes seeds, timestamps, and run IDs stable from one run to the next. See determinism fixtures. - Shadow-runs for runtime migrations.
ShadowRun::compareruns the same model twice through two independent lanes and refuses to merge anything that changes the frame checksum.
This page is the map. Each of the six sub-pages below is the territory for one part of the stack.
Why no mocks
The no-mock policy forbids hiding real behaviour behind trait objects. The rules are simple:
- Output capture is allowed — a
Vec<u8>that implementsWriteto record ANSI output is not a mock, it is a sink. - Data builders are allowed —
test_panel()constructing aPanelis a factory, not a substitute. - Headless rendering is encouraged —
ProgramSimulator,HeadlessTerm, and directBuffer/Frameuse are the first choice. - Mocking terminal I/O, event parsers, or the render pipeline is
disallowed. Use PTY tests, real
InputParserbyte feeds, and real buffers + diffs.
The consequence: every passing test exercises the same code path a user would. A green test cannot hide an incorrect ANSI sequence.
Map of the stack
- src/lib.rs snapshot macros, buffer→text/ANSI
- src/determinism.rs seed + clock + env capture
- src/lab_integration.rs Lab + LabSession + FrameRecord
- src/shadow_run.rs ShadowRun::compare + ShadowVerdict
- src/rollout_scorecard.rs Go / NoGo / Inconclusive
- src/benchmark_gate.rs Threshold + Measurement + GateResult
- src/cli.rs replay / seed-demo / migrate / certify / report / plan
- src/doctor.rs environment certification
- src/capture.rs replay orchestration
- src/suite.rs profile-matrix replay
- e2e_test.sh main PTY suite
- doctor_frankentui_*.sh certification + determinism + coverage
- perf_regression_gate.sh criterion budget check
- bench_budget.sh budget-constrained benchmarks
The four verification layers
Unit + snapshot layer (ftui-harness + ftui-demo-showcase)
In-process tests call assert_snapshot!(name, &buf) after rendering a
widget or screen. Mismatches dump a unified diff and fail the run.
Updating is one env var away:
BLESS=1 cargo test -p ftui-demo-showcase
cargo insta reviewSee snapshot tests.
Headless-runtime layer (ProgramSimulator)
Runs a Model with no terminal. Injects events, sends messages, advances
ticks, captures frames. Used for runtime-loop correctness tests where
pixels are secondary to state machine transitions.
Shadow-run layer (ShadowRun::compare)
Runs the same model twice under deterministic seeds through two
independent LabSessions. Compares frame checksums. Used to certify
that a runtime migration (threading → Asupersync executor, diff strategy
swap, etc.) is behaviour-preserving. Cross-links to
runtime rollout lanes,
shadow-run docs, and the
rollout scorecard.
E2E + CI-gate layer (scripts/*.sh + doctor_frankentui)
Shell scripts drive the full app through a PTY, write JSONL evidence ledgers, and enforce coverage and performance budgets. CI fails on any gate breach. See E2E scripts and benchmark gate.
The determinism contract
A deterministic test run produces:
- Stable
run_idof the form{prefix}_seed{seed}. - Stable timestamps of the form
T{n:06}instead of wall-clock seconds. - A monotonic
now_ms()that advances byE2E_TIME_STEP_MSper call. - A stable
env_snapshotcapturingTERM,COLORTERM,NO_COLOR,TMUX,ZELLIJ,seed, anddeterministic.
This is what makes the frame checksums in shadow-runs comparable across two lanes, and what lets the determinism soak test detect non-volatile divergence.