Shadow-run comparison

A shadow-run is a two-lane, seed-identical execution of the same Model and event sequence, with frame checksums compared at the end. It is FrankenTUI’s primary mechanism for proving that swapping the runtime underneath (threading → Asupersync, old diff → new diff, legacy tick strategy → predictive) is behaviour-preserving.

Source: crates/ftui-harness/src/shadow_run.rs + lab_integration.rs.

Mental model


ShadowRunConfig { seed, viewport, time_step, labels }
        │
        ├──▶  Lab "baseline"   ─▶  LabSession  ─▶  frames + checksums
        │                              ▲
        │                      scenario closure
        │                              ▼
        └──▶  Lab "candidate"  ─▶  LabSession  ─▶  frames + checksums
                                      │
                             zip & compare ─▶ ShadowVerdict

Both lanes start from the same seed, render at the same viewport, and tick with the same time_step_ms. The scenario closure is called twice — once per lane, on two independently constructed Models — and the resulting FrameRecords (index, timestamp_ms, FNV-1a checksum) are zipped and compared. Any divergent checksum fails the test.

API at a glance

`ShadowRunConfig`


pub struct ShadowRunConfig {
    pub prefix: String,
    pub scenario_name: String,
    pub seed: u64,
    pub viewport_width: u16,
    pub viewport_height: u16,
    pub time_step_ms: u64,
    pub baseline_label: String,   // default "baseline"
    pub candidate_label: String,  // default "candidate"
}

Builder defaults: 80×24 viewport, 16ms time step.

Method	Purpose
`ShadowRunConfig::new(prefix, scenario_name, seed)`	Construct with defaults.
`.viewport(w, h)`	Override frame capture size.
`.time_step_ms(ms)`	Override deterministic tick cadence.
`.lane_labels(baseline, candidate)`	Rename the lanes in JSONL output.

`ShadowVerdict`


pub enum ShadowVerdict {
    Match,     // every frame checksum matched
    Diverged,  // at least one frame checksum differed
}

`ShadowRunResult`

Field	Type	Meaning
`verdict`	`ShadowVerdict`	Overall outcome.
`scenario_name`	`String`	Label reused in JSONL.
`seed`	`u64`	Seed used for both lanes.
`frame_comparisons`	`Vec<FrameComparison>`	Per-frame `(index, baseline_checksum, candidate_checksum, matched)`.
`first_divergence`	`Option<usize>`	Index of the first mismatched frame.
`frames_compared`	`usize`	Count of zipped frames.
`baseline` / `candidate`	`LabOutput`	Full per-lane record (frames, events, anomalies).
`baseline_label` / `candidate_label`	`String`	Human-readable lane names.
`run_total`	`u64`	Monotonic counter from `shadow_runs_total()`.

Helpers: diverged_count(), match_ratio().

`ShadowRun::compare`


ShadowRun::compare(config, model_factory, scenario_fn) -> ShadowRunResult

model_factory: impl Fn() -> M — called twice, once per lane, so each LabSession owns an independent model instance.
scenario_fn: impl Fn(&mut LabSession<M>) — drives the session. Must be deterministic; no wall-clock reads, no RNG without a seeded source.

All comparison evidence is emitted as JSONL via TestJsonlLogger.

Worked example: certifying a runtime migration

tests/migration_shadow.rs


use ftui_harness::shadow_run::{ShadowRun, ShadowRunConfig, ShadowVerdict};
use ftui_runtime::program::{Cmd, Model};
use ftui_render::frame::Frame;
use ftui_core::event::Event;
use ftui_core::geometry::Rect;
use ftui_widgets::paragraph::Paragraph;
 
#[derive(Clone, Debug)]
enum Msg { Tick }
 
impl From<Event> for Msg {
    fn from(_: Event) -> Self { Msg::Tick }
}
 
#[derive(Default)]
struct TickCounter { ticks: u64 }
 
impl Model for TickCounter {
    type Message = Msg;
    fn update(&mut self, _: Msg) -> Cmd<Msg> {
        self.ticks += 1;
        Cmd::none()
    }
    fn view(&self, frame: &mut Frame) {
        let s = format!("ticks={}", self.ticks);
        let area = Rect::new(0, 0, frame.width(), 1);
        Paragraph::new(s.as_str()).render(area, frame);
    }
}
 
#[test]
fn runtime_migration_preserves_frames() {
    let config = ShadowRunConfig::new("migration_test", "tick_counter", 42)
        .viewport(80, 24)
        .time_step_ms(16)
        .lane_labels("threading", "asupersync");
 
    let result = ShadowRun::compare(
        config,
        TickCounter::default,
        |session| {
            session.init();
            for _ in 0..30 {
                session.tick();
                session.capture_frame();
            }
        },
    );
 
    assert_eq!(result.verdict, ShadowVerdict::Match);
    assert_eq!(result.first_divergence, None);
    assert!(result.match_ratio() > 0.999);
}

On a divergence:


match result.verdict {
    ShadowVerdict::Match => {}
    ShadowVerdict::Diverged => {
        let idx = result.first_divergence.unwrap();
        let fc = &result.frame_comparisons[idx];
        panic!(
            "divergence at frame {}: baseline={:016x} candidate={:016x}",
            fc.index, fc.baseline_checksum, fc.candidate_checksum,
        );
    }
}

`LabSession` methods used inside the scenario

Method	Effect
`init()`	Calls `Model::init`, logs viewport to JSONL.
`send(msg)`	Dispatch a message; records to event log.
`inject_event(evt)` / `inject_events(&evts)`	Real `Event`s through `From<Event>`.
`tick()`	Inject `Event::Tick`, advance deterministic clock by `time_step_ms`.
`capture_frame()`	Render at the configured viewport, store FNV-1a checksum.
`capture_frame_at(w, h)`	Same, at custom dimensions.
`frame_records()` / `event_log()`	Inspect the record for this lane.
`now_ms()`	Deterministic milliseconds since session start.

Aggregating shadow runs for rollout decisions

ShadowRun is the raw primitive. For release-gating — combining multiple scenarios with an optional benchmark gate — wrap the results in a RolloutScorecard:


use ftui_harness::rollout_scorecard::{
    RolloutScorecard, RolloutScorecardConfig, RolloutVerdict,
};
 
let mut scorecard = RolloutScorecard::new(
    RolloutScorecardConfig::default()
        .min_shadow_scenarios(3)
        .min_match_ratio(1.0)
        .require_benchmark_pass(true),
);
 
scorecard.add_shadow_result(result_a);
scorecard.add_shadow_result(result_b);
scorecard.add_shadow_result(result_c);
scorecard.set_benchmark_gate(gate_result);
 
assert_eq!(scorecard.evaluate(), RolloutVerdict::Go);

Pitfalls

The scenario closure must be pure. Reading the system clock, using rand::thread_rng(), or touching global state will make the two lanes diverge. Route randomness through a seeded source and time through session.now_ms().

Frame counts must match. If the scenario captures 30 frames in one lane and 29 in the other (e.g. an early Cmd::Quit in candidate only), frames_compared drops and the mismatch is a divergence. Prefer scenarios that run to a fixed iteration count.

Don’t share a Model across lanes. Use the model_factory closure; constructing it once and cloning will leak cross-lane state for anything behind Arc or Rc.

Operational links

Runtime rollout: shadow-run Runtime rollout: scorecard Runtime rollout: lanes ProgramSimulator Determinism fixtures doctor_frankentui determinism soak

How this piece fits in testing.

Testing overview