Mondrian Conformal Frame-Time Gating

What goes wrong with a naive approach

Vanilla conformal prediction assumes residuals are exchangeable across the whole stream. For frame time, that is blatantly false. A frame that rendered an inline prompt has a completely different cost profile from a frame that rendered a full-screen table. Pooling them gives a quantile dominated by the hard cases — the easy ones are massively over-covered and the hard ones remain barely-covered.

Three failure modes pop up immediately:

Cross-regime pollution. A 20 ms outlier from a scroll burst raises the global quantile forever; steady-state frames are gated far too conservatively.
Insufficient samples per regime. You have thousands of steady frames and ten scroll-burst frames. A per-bucket quantile over 10 points is useless.
Silent mispicks. The diff-strategy picker changes which strategy it uses as the workload drifts; the conformal layer has no way to know that the calibration set just became obsolete.

The right answer is Mondrian conformal prediction (Vovk, 2003): partition the calibration set into buckets that are conditionally exchangeable, quote per-bucket quantiles, and fall back up a hierarchy when a bucket is too sparse.

Mental model

Partition frames by a tuple of context keys:

(M, D, S) = (\text{screen mode},\; \text{diff strategy},\; \text{size class})

Keep a separate residual window per $(M, D, S)$ . On each frame:

Look up the bucket’s window.
If $|\text{window}| \ge 20$ , use its quantile.
Else fall back to $(M, D, \star)$ — any size class.
Else fall back to $(M, \star, \star)$ — any strategy.
Else fall back to $(\star, \star, \star)$ — global.

The fallback hierarchy is crucial: the system stays calibrated and always has a bound to quote, even in corner buckets that have never been hit.

Mondrian is conformal prediction plus a coarsening operator. Fine-grained buckets give precise bounds where you have data; the coarsening hierarchy gives you a valid bound where you don’t. The same theorem covers both levels.

The math

Per-bucket quantile

For each bucket $B_k$ with calibration residuals $R^{(k)}_1, \ldots, R^{(k)}_{n_k}$ :

q_k = R^{(k)}_{(\lceil (1-\alpha)(n_k + 1) \rceil)} \qquad \text{if } n_k \ge n_{\min}

Otherwise fall through to the parent bucket.

Fallback hierarchy

(M, D, S) \longrightarrow (M, D, \star) \longrightarrow (M, \star, \star) \longrightarrow (\star, \star, \star)

At the root, if even the global window is short, quote the fallback budget (default 16000 µs) — a hard engineering constant beyond which we always degrade.

Upper bound on frame time

For a frame with predicted time $\hat{y}$ in bucket $k$ :

\hat{y}^+ = \hat{y} + q_k

Gating rule: if $\hat{y}^+ > \text{budget}$ , trigger degradation. See control theory for what degradation looks like.

Defaults

Parameter	Default
$\alpha$	0.05
$n_{\min}$	20
window size per bucket	256
fallback budget	16 000 µs (60 fps)
hysteresis $\eta$	1.1

Worked example — a bucket fallback

Suppose you have the following bucket populations after 30 seconds of running the showcase:

Bucket	Samples
`(alt, dirty, medium)`	512
`(alt, full, large)`	23
`(alt, full, small)`	3
`(inline, dirty, medium)`	40
`(inline, dirty, small)`	1

Per-frame resolution:

(alt, dirty, medium) — use bucket’s own $q$ (plenty of data).
(alt, full, large) — 23 ≥ 20, use bucket’s own $q$ .
(alt, full, small) — 3 < 20, fall back to (alt, full, ★), which has $23 + 3 = 26$ samples. Use that $q$ .
(inline, dirty, small) — 1 < 20 and (inline, dirty, ★) has $40 + 1 = 41$ samples. Use that $q$ .

At each fall-back, the theorem still applies because exchangeability only needs to hold within the coarsened bucket.

Rust interface

crates/ftui-runtime/src/conformal_frame_guard.rs


use ftui_runtime::conformal_frame_guard::{
    ConformalFrameGuard, FrameGuardConfig, BucketKey,
};
 
let mut guard = ConformalFrameGuard::new(FrameGuardConfig {
    alpha: 0.05,
    min_samples: 20,
    window_size: 256,
    fallback_budget_us: 16_000,
    hysteresis: 1.1,
});
 
// After a frame, feed the context + residual:
guard.observe(
    BucketKey { mode, diff_strategy, size_class },
    y_hat_us,
    observed_us,
);
 
// At the next frame, gate on the bound:
let decision = guard.decide(BucketKey { mode, diff_strategy, size_class }, y_hat_us);
if decision.exceeds_budget {
    degradation.step_down();
}

How to debug

Every decision emits a conformal_frame_guard line:


{"schema":"conformal_frame_guard","y_hat_us":14200,
 "upper_us":15800,"budget_us":16000,
 "exceeds_budget":false,
 "bucket":{"mode":"alt","diff":"dirty","size":"medium"},
 "fallback_level":0,"calibration_size":512}


FTUI_EVIDENCE_SINK=/tmp/ftui.jsonl cargo run -p ftui-demo-showcase
 
# How often did we fall back, and to which level?
jq -c 'select(.schema=="conformal_frame_guard")
       | .fallback_level' /tmp/ftui.jsonl | sort | uniq -c

Large counts at fallback_level=2 or 3 mean your bucketing keys are too fine — consider coarsening size_class so buckets fill up faster.

Pitfalls

Bucket keys must reflect cost regimes, not identifiers. Keying by pane ID or widget name produces dozens of identifier-level buckets that each need 20 samples. Key by what makes frames expensive: screen mode, diff strategy, size class. Leave widget identity out.

fallback_budget_us is a backstop, not a budget. It fires only when even the global window is below $n_{\min}$ . Tune the real budget via the budget_us config and the degradation cascade; don’t lean on the backstop.

Cross-references

/operations/frame-budget — the top- level document describing how the guard, degradation, and PI pacing cooperate.
Vanilla conformal — the theorem that underpins every bucket’s quantile.
Control theory — PI + degradation cascade that consumes the exceeds_budget signal.

Where next

How this piece fits in intelligence.

Intelligence overview

The theorem that underpins every bucket’s quantile.

Vanilla conformal