Vanilla Conformal Prediction

What goes wrong with a naive approach

You have a predictor $\hat{y}$ for a quantity you care about — frame time, row height, allocation deltas — and you want an upper bound so the consumer can preallocate, gate, or alert safely. The textbook answers all fail in a TUI:

Parametric $\hat{y} + 2\sigma$ — assumes Gaussian residuals. Real residuals are heavy-tailed (one pathological frame, a single multi-line row, a flushed writer) and the Gaussian bound under-covers exactly when you need it most.
Empirical max — uses the worst-observed residual. Over-covers after one outlier forever; the bound never tightens.
Hand-tuned percentile — “use the 95th percentile of the last N samples”. Works, but “finite-sample coverage” is a hope, not a guarantee; and no one in the codebase knows why it’s 95 instead of 90.

Conformal prediction (Vovk et al., 2005) gives you the hand-tuned percentile with a proof. The only assumption is exchangeability of calibration points — much weaker than i.i.d. — and the coverage guarantee is finite-sample: it holds after $n = 10$ observations, not “in the limit”.

Mental model

Keep a rolling buffer of calibration residuals $R_i = |y_i - \hat{y}_i|$ . On each prediction, quote the bound:

\hat{y}^+ = \hat{y} + q, \qquad q = \text{Quantile}_{\lceil (1-\alpha)(n+1) \rceil / n} (R_{1..n})

With probability at least $1 - \alpha$ over the next observation, $y_{n+1} \le \hat{y}^+$ . No distributional assumption beyond exchangeability. If residuals drift, the bound drifts with them.

Conformal prediction is the reverse of parametric uncertainty. Instead of asserting a noise model and computing a bound, you compute an empirical quantile and borrow the coverage guarantee from exchangeability. It is the most honest uncertainty estimator available for TUI workloads.

The math

Calibration and the lifted quantile

Given $n$ calibration residuals, the conformal quantile is the $\lceil (1-\alpha)(n+1)/n \rceil \cdot 100\%$ empirical quantile. The $(n+1)$ lift is what secures the finite-sample guarantee — it accounts for the test point as if it were also in the calibration set.

q_n = R_{(\lceil (1-\alpha)(n+1) \rceil)}

(where $R_{(k)}$ is the $k$ -th order statistic).

Coverage theorem

If $R_1, \ldots, R_{n+1}$ are exchangeable, then:

P(R_{n+1} \le q_n) \ge 1 - \alpha

Not asymptotic; exact for any $n$ .

Hysteresis factor

To stop the bound from flapping around a threshold, FrankenTUI applies a multiplicative hysteresis $\eta$ (default $1.1$ ):

\hat{y}^{++} = \eta \cdot (\hat{y} + q)

This widens the bound by 10% — a trade of miniscule over-coverage for stability.

E-process layer

For anytime-valid alerts on top of the conformal bound, pair with an e-process over normalised residuals $z_t = (R_t - q) / \sigma_0$ :

e_t = \exp\!\left(\lambda z_t - \tfrac{1}{2}\lambda^2 \sigma_0^2\right)

Alert when $\prod_s e_s \ge 1/\alpha$ . See e-processes for the theory.

Defaults

Parameter	Default
$\alpha$	0.05
$\lambda$	0.5
min calibration size	10
max calibration size	500
hysteresis $\eta$	1.1

Worked example — one-liner alert gate


// Sliding window of residuals |y_t - ŷ_t|.
let q = conformal.quantile(alpha); // 95th-percentile if α=0.05
if observed > eta * (y_hat + q) {
    raise_alert();
}

On a stream of 200 frame-time residuals with $\alpha = 0.05$ :

k = \lceil 0.95 \cdot 201 \rceil = 191, \qquad q = R_{(191)}

If the next frame exceeds $\eta(\hat{y} + q)$ , the alert is valid in the conformal sense — the false-alarm probability under $H_0$ is at most $\alpha$ .

Rust interface

crates/ftui-runtime/src/conformal_alert.rs


use ftui_runtime::conformal_alert::{ConformalAlert, ConformalConfig};
 
let mut alert = ConformalAlert::new(ConformalConfig {
    alpha: 0.05,
    lambda: 0.5,
    min_calibration: 10,
    max_calibration: 500,
    hysteresis: 1.1,
});
 
alert.observe(residual);           // push into calibration window
let bound = alert.upper_bound(y_hat);
 
if observed > bound {
    // conformal threshold breached
}

How to debug

The alert emits conformal_alert lines:


{"schema":"conformal_alert","y_hat":18.2,
 "q":4.1,"bound":24.53,"hysteresis":1.1,
 "observed":29.8,"triggered":true,
 "calibration_size":200,"alpha":0.05}


FTUI_EVIDENCE_SINK=/tmp/ftui.jsonl cargo run -p ftui-demo-showcase
 
# Empirical coverage over the run:
jq -c 'select(.schema=="conformal_alert")
       | {bound, observed, inside: (.observed <= .bound)}' \
  /tmp/ftui.jsonl | jq -s 'map(.inside) | [add, length] | "\(.0)/\(.1)"'

You should see roughly $1-\alpha$ coverage. If it’s much lower, the calibration window is too small or stale; if it’s 100%, the bound is too loose and $\alpha$ can be raised.

Pitfalls

Exchangeability is not i.i.d., but it isn’t free either. If the residual distribution drifts (new hardware, new workload), old residuals pollute the quantile. Keep the window short (max_calibration=500) and/or bucket by context — see Mondrian conformal for the bucketed version.

Cold-start over-coverage. With $n = 10$ the lifted quantile is $\lceil 0.95 \cdot 11 \rceil = 11$ — the maximum of 10 observations. Until the window fills, the bound is extremely conservative. That is the correct behaviour, but users can mistake it for a bug.

Cross-references

Mondrian conformal — bucketed version for frame-time gating.
Rank confidence — the palette’s tie-breaking layer.
E-processes — how the $e_t$ layer extends conformal into anytime-valid testing.

Where next

How this piece fits in intelligence.

Intelligence overview

Bucketed version for frame-time gating.

Mondrian conformal