Rollout scorecard
Shadow runs prove behavioural equivalence; benchmarks prove
performance parity. The rollout scorecard is the single object
that combines the two into a structured RolloutVerdict — Go,
NoGo, or Inconclusive — and emits a self-contained evidence
bundle operators can attach to a release decision.
File: crates/ftui-harness/src/rollout_scorecard.rs.
Configuration
pub struct RolloutScorecardConfig {
/// Minimum number of shadow-run scenarios required. Default: 1.
pub min_shadow_scenarios: usize,
/// Minimum frame match ratio across all shadow runs (0.0..=1.0). Default: 1.0.
pub min_match_ratio: f64,
/// Whether a passing benchmark gate is required for Go. Default: false.
pub require_benchmark_pass: bool,
}
impl RolloutScorecardConfig {
pub fn min_shadow_scenarios(self, n: usize) -> Self;
pub fn min_match_ratio(self, ratio: f64) -> Self; // clamped to 0..=1
pub fn require_benchmark_pass(self, required: bool) -> Self;
}Defaults are conservative: one shadow run and 100 % frame match. For a production rollout, require several scenarios:
let cfg = RolloutScorecardConfig::default()
.min_shadow_scenarios(5)
.min_match_ratio(1.0)
.require_benchmark_pass(true);Verdict
pub enum RolloutVerdict {
Go, // All evidence meets thresholds.
NoGo, // Determinism or performance regression detected.
Inconclusive, // Not enough evidence to decide.
}Go requires all of:
shadow_results.len() >= min_shadow_scenariosaggregate_match_ratio() >= min_match_ratio- No
ShadowVerdict::Divergedpresent. - If
require_benchmark_pass, aGateResultmust be attached andgate.passed().
Anything short of that returns Inconclusive (missing evidence) or
NoGo (evidence says “don’t”).
API
pub struct RolloutScorecard { /* ... */ }
impl RolloutScorecard {
pub fn new(config: RolloutScorecardConfig) -> Self;
pub fn add_shadow_result(&mut self, result: ShadowRunResult);
pub fn set_benchmark_gate(&mut self, result: GateResult);
pub fn shadow_scenario_count(&self) -> usize;
pub fn shadow_match_count(&self) -> usize;
pub fn aggregate_match_ratio(&self) -> f64;
pub fn evaluate(&self) -> RolloutVerdict;
pub fn summary(&self) -> RolloutSummary;
}Evidence bundle
RolloutEvidenceBundle is the release artefact — JSON that combines
the scorecard verdict with runtime-observed telemetry and lane
metadata:
pub struct RolloutEvidenceBundle {
pub scorecard: RolloutSummary,
pub queue_telemetry: Option<QueueTelemetry>,
pub requested_lane: String,
pub resolved_lane: String,
pub rollout_policy: String,
}
impl RolloutEvidenceBundle {
pub fn to_json(&self) -> String;
}A shortened example of what to_json() produces:
{
"schema_version": "1.0.0",
"scorecard": {
"verdict": "GO",
"shadow_scenarios": 5,
"shadow_matches": 5,
"aggregate_match_ratio": 1.0,
"total_frames_compared": 4800,
"benchmark_passed": "pass",
"config": {
"min_shadow_scenarios": 5,
"min_match_ratio": 1.0,
"benchmark_required": true
}
},
"queue_telemetry": {
"enqueued": 1234,
"processed": 1234,
"dropped": 0,
"high_water": 12,
"in_flight": 0
},
"runtime": {
"requested_lane": "structured",
"resolved_lane": "structured",
"rollout_policy": "shadow"
}
}Worked example
use ftui_harness::{
rollout_scorecard::{RolloutScorecard, RolloutScorecardConfig, RolloutVerdict},
shadow_run::{ShadowRun, ShadowRunConfig},
};
#[test]
fn structured_is_go_for_counter_scenarios() {
let scenarios = [
("increment", 42),
("reset_on_zero", 43),
("burst_ticks", 44),
("resize", 45),
("quit_path", 46),
];
let mut scorecard = RolloutScorecard::new(
RolloutScorecardConfig::default()
.min_shadow_scenarios(5)
.min_match_ratio(1.0),
);
for (name, seed) in scenarios {
let cfg = ShadowRunConfig::new("rollout/counter", name, seed)
.viewport(40, 10);
let result = ShadowRun::compare(cfg, || CounterModel::new(), |s| {
s.init();
for _ in 0..30 { s.tick(); s.capture_frame(); }
});
scorecard.add_shadow_result(result);
}
assert_eq!(scorecard.evaluate(), RolloutVerdict::Go);
let summary = scorecard.summary();
std::fs::write("rollout_summary.json", summary.to_json()).unwrap();
}Attach a benchmark gate (see the benchmark_gate module) to require
performance parity:
scorecard.set_benchmark_gate(gate_result);Reading a verdict in CI
# Fail CI unless the scorecard said Go:
jq -r '.scorecard.verdict' rollout_summary.json | grep -qx 'GO'
# Fleet dashboard: count lanes by resolved lane
jq -r '.runtime.resolved_lane' **/rollout_evidence.json | sort | uniq -cPitfalls
min_match_ratio < 1.0 lets divergence slip through. The ratio
helps for long-running scenarios with known-benign noise (e.g.
external timestamps baked into the UI), but it is not a substitute
for fixing the source. Prefer a deterministic harness first; lower
the ratio only after you know why frames diverge.
Scorecard “Inconclusive” is not “Go”. CI gates must distinguish
Go from Inconclusive. The default config accepts a single
scenario, which is rarely enough to declare parity — bump
min_shadow_scenarios for release builds.