Skip to Content
TestingShadow run

Shadow-run comparison

A shadow-run is a two-lane, seed-identical execution of the same Model and event sequence, with frame checksums compared at the end. It is FrankenTUI’s primary mechanism for proving that swapping the runtime underneath (threading → Asupersync, old diff → new diff, legacy tick strategy → predictive) is behaviour-preserving.

Source: crates/ftui-harness/src/shadow_run.rs + lab_integration.rs.

Mental model

ShadowRunConfig { seed, viewport, time_step, labels } ├──▶ Lab "baseline" ─▶ LabSession ─▶ frames + checksums │ ▲ │ scenario closure │ ▼ └──▶ Lab "candidate" ─▶ LabSession ─▶ frames + checksums zip & compare ─▶ ShadowVerdict

Both lanes start from the same seed, render at the same viewport, and tick with the same time_step_ms. The scenario closure is called twice — once per lane, on two independently constructed Models — and the resulting FrameRecords (index, timestamp_ms, FNV-1a checksum) are zipped and compared. Any divergent checksum fails the test.

API at a glance

ShadowRunConfig

pub struct ShadowRunConfig { pub prefix: String, pub scenario_name: String, pub seed: u64, pub viewport_width: u16, pub viewport_height: u16, pub time_step_ms: u64, pub baseline_label: String, // default "baseline" pub candidate_label: String, // default "candidate" }

Builder defaults: 80×24 viewport, 16ms time step.

MethodPurpose
ShadowRunConfig::new(prefix, scenario_name, seed)Construct with defaults.
.viewport(w, h)Override frame capture size.
.time_step_ms(ms)Override deterministic tick cadence.
.lane_labels(baseline, candidate)Rename the lanes in JSONL output.

ShadowVerdict

pub enum ShadowVerdict { Match, // every frame checksum matched Diverged, // at least one frame checksum differed }

ShadowRunResult

FieldTypeMeaning
verdictShadowVerdictOverall outcome.
scenario_nameStringLabel reused in JSONL.
seedu64Seed used for both lanes.
frame_comparisonsVec<FrameComparison>Per-frame (index, baseline_checksum, candidate_checksum, matched).
first_divergenceOption<usize>Index of the first mismatched frame.
frames_comparedusizeCount of zipped frames.
baseline / candidateLabOutputFull per-lane record (frames, events, anomalies).
baseline_label / candidate_labelStringHuman-readable lane names.
run_totalu64Monotonic counter from shadow_runs_total().

Helpers: diverged_count(), match_ratio().

ShadowRun::compare

ShadowRun::compare(config, model_factory, scenario_fn) -> ShadowRunResult
  • model_factory: impl Fn() -> M — called twice, once per lane, so each LabSession owns an independent model instance.
  • scenario_fn: impl Fn(&mut LabSession<M>) — drives the session. Must be deterministic; no wall-clock reads, no RNG without a seeded source.

All comparison evidence is emitted as JSONL via TestJsonlLogger.

Worked example: certifying a runtime migration

tests/migration_shadow.rs
use ftui_harness::shadow_run::{ShadowRun, ShadowRunConfig, ShadowVerdict}; use ftui_runtime::program::{Cmd, Model}; use ftui_render::frame::Frame; use ftui_core::event::Event; use ftui_core::geometry::Rect; use ftui_widgets::paragraph::Paragraph; #[derive(Clone, Debug)] enum Msg { Tick } impl From<Event> for Msg { fn from(_: Event) -> Self { Msg::Tick } } #[derive(Default)] struct TickCounter { ticks: u64 } impl Model for TickCounter { type Message = Msg; fn update(&mut self, _: Msg) -> Cmd<Msg> { self.ticks += 1; Cmd::none() } fn view(&self, frame: &mut Frame) { let s = format!("ticks={}", self.ticks); let area = Rect::new(0, 0, frame.width(), 1); Paragraph::new(s.as_str()).render(area, frame); } } #[test] fn runtime_migration_preserves_frames() { let config = ShadowRunConfig::new("migration_test", "tick_counter", 42) .viewport(80, 24) .time_step_ms(16) .lane_labels("threading", "asupersync"); let result = ShadowRun::compare( config, TickCounter::default, |session| { session.init(); for _ in 0..30 { session.tick(); session.capture_frame(); } }, ); assert_eq!(result.verdict, ShadowVerdict::Match); assert_eq!(result.first_divergence, None); assert!(result.match_ratio() > 0.999); }

On a divergence:

match result.verdict { ShadowVerdict::Match => {} ShadowVerdict::Diverged => { let idx = result.first_divergence.unwrap(); let fc = &result.frame_comparisons[idx]; panic!( "divergence at frame {}: baseline={:016x} candidate={:016x}", fc.index, fc.baseline_checksum, fc.candidate_checksum, ); } }

LabSession methods used inside the scenario

MethodEffect
init()Calls Model::init, logs viewport to JSONL.
send(msg)Dispatch a message; records to event log.
inject_event(evt) / inject_events(&evts)Real Events through From<Event>.
tick()Inject Event::Tick, advance deterministic clock by time_step_ms.
capture_frame()Render at the configured viewport, store FNV-1a checksum.
capture_frame_at(w, h)Same, at custom dimensions.
frame_records() / event_log()Inspect the record for this lane.
now_ms()Deterministic milliseconds since session start.

Aggregating shadow runs for rollout decisions

ShadowRun is the raw primitive. For release-gating — combining multiple scenarios with an optional benchmark gate — wrap the results in a RolloutScorecard:

use ftui_harness::rollout_scorecard::{ RolloutScorecard, RolloutScorecardConfig, RolloutVerdict, }; let mut scorecard = RolloutScorecard::new( RolloutScorecardConfig::default() .min_shadow_scenarios(3) .min_match_ratio(1.0) .require_benchmark_pass(true), ); scorecard.add_shadow_result(result_a); scorecard.add_shadow_result(result_b); scorecard.add_shadow_result(result_c); scorecard.set_benchmark_gate(gate_result); assert_eq!(scorecard.evaluate(), RolloutVerdict::Go);

Pitfalls

The scenario closure must be pure. Reading the system clock, using rand::thread_rng(), or touching global state will make the two lanes diverge. Route randomness through a seeded source and time through session.now_ms().

Frame counts must match. If the scenario captures 30 frames in one lane and 29 in the other (e.g. an early Cmd::Quit in candidate only), frames_compared drops and the mismatch is a divergence. Prefer scenarios that run to a fixed iteration count.

Don’t share a Model across lanes. Use the model_factory closure; constructing it once and cloning will leak cross-lane state for anything behind Arc or Rc.