Vanilla Conformal Prediction
What goes wrong with a naive approach
You have a predictor for a quantity you care about — frame time, row height, allocation deltas — and you want an upper bound so the consumer can preallocate, gate, or alert safely. The textbook answers all fail in a TUI:
- Parametric — assumes Gaussian residuals. Real residuals are heavy-tailed (one pathological frame, a single multi-line row, a flushed writer) and the Gaussian bound under-covers exactly when you need it most.
- Empirical max — uses the worst-observed residual. Over-covers after one outlier forever; the bound never tightens.
- Hand-tuned percentile — “use the 95th percentile of the last N samples”. Works, but “finite-sample coverage” is a hope, not a guarantee; and no one in the codebase knows why it’s 95 instead of 90.
Conformal prediction (Vovk et al., 2005) gives you the hand-tuned percentile with a proof. The only assumption is exchangeability of calibration points — much weaker than i.i.d. — and the coverage guarantee is finite-sample: it holds after observations, not “in the limit”.
Mental model
Keep a rolling buffer of calibration residuals . On each prediction, quote the bound:
With probability at least over the next observation, . No distributional assumption beyond exchangeability. If residuals drift, the bound drifts with them.
Conformal prediction is the reverse of parametric uncertainty. Instead of asserting a noise model and computing a bound, you compute an empirical quantile and borrow the coverage guarantee from exchangeability. It is the most honest uncertainty estimator available for TUI workloads.
The math
Calibration and the lifted quantile
Given calibration residuals, the conformal quantile is the empirical quantile. The lift is what secures the finite-sample guarantee — it accounts for the test point as if it were also in the calibration set.
(where is the -th order statistic).
Coverage theorem
If are exchangeable, then:
Not asymptotic; exact for any .
Hysteresis factor
To stop the bound from flapping around a threshold, FrankenTUI applies a multiplicative hysteresis (default ):
This widens the bound by 10% — a trade of miniscule over-coverage for stability.
E-process layer
For anytime-valid alerts on top of the conformal bound, pair with an e-process over normalised residuals :
Alert when . See e-processes for the theory.
Defaults
| Parameter | Default |
|---|---|
| 0.05 | |
| 0.5 | |
| min calibration size | 10 |
| max calibration size | 500 |
| hysteresis | 1.1 |
Worked example — one-liner alert gate
// Sliding window of residuals |y_t - ŷ_t|.
let q = conformal.quantile(alpha); // 95th-percentile if α=0.05
if observed > eta * (y_hat + q) {
raise_alert();
}On a stream of 200 frame-time residuals with :
If the next frame exceeds , the alert is valid in the conformal sense — the false-alarm probability under is at most .
Rust interface
use ftui_runtime::conformal_alert::{ConformalAlert, ConformalConfig};
let mut alert = ConformalAlert::new(ConformalConfig {
alpha: 0.05,
lambda: 0.5,
min_calibration: 10,
max_calibration: 500,
hysteresis: 1.1,
});
alert.observe(residual); // push into calibration window
let bound = alert.upper_bound(y_hat);
if observed > bound {
// conformal threshold breached
}How to debug
The alert emits conformal_alert lines:
{"schema":"conformal_alert","y_hat":18.2,
"q":4.1,"bound":24.53,"hysteresis":1.1,
"observed":29.8,"triggered":true,
"calibration_size":200,"alpha":0.05}FTUI_EVIDENCE_SINK=/tmp/ftui.jsonl cargo run -p ftui-demo-showcase
# Empirical coverage over the run:
jq -c 'select(.schema=="conformal_alert")
| {bound, observed, inside: (.observed <= .bound)}' \
/tmp/ftui.jsonl | jq -s 'map(.inside) | [add, length] | "\(.0)/\(.1)"'You should see roughly coverage. If it’s much lower, the calibration window is too small or stale; if it’s 100%, the bound is too loose and can be raised.
Pitfalls
Exchangeability is not i.i.d., but it isn’t free either. If the residual distribution drifts (new hardware, new workload), old residuals pollute the quantile. Keep the window short (max_calibration=500) and/or bucket by context — see Mondrian conformal for the bucketed version.
Cold-start over-coverage. With the lifted quantile is — the maximum of 10 observations. Until the window fills, the bound is extremely conservative. That is the correct behaviour, but users can mistake it for a bug.
Cross-references
- Mondrian conformal — bucketed version for frame-time gating.
- Rank confidence — the palette’s tie-breaking layer.
- E-processes — how the layer extends conformal into anytime-valid testing.