Command-Palette Evidence Ledger
What goes wrong with a naive approach
A classical fuzzy search ranks candidates by an ad-hoc score: a prefix match gets +10, a word-boundary hit +5, each gap −1, tags add a bonus. The constants are picked by the author’s taste. The number “+10” has no units, no noise model, no way to combine it with a second piece of evidence other than hoping the scales line up. When a user complains that “open settings” was ranked below “opens ettings” you stare at the code and cannot even say why.
The palette needs three things the naive score can’t provide:
- A common currency for clues that disagree (a great prefix but bad gaps).
- An explanation for each ranking decision — which clue mattered, by how much.
- Stable ordering when scores tie, so top-k doesn’t flicker as the user types.
The first two come from Bayes factors in log-odds space. The third is handled by the rank-confidence layer on top.
Mental model
Treat ranking as a hypothesis test: is this candidate the user’s intended command, or not?
- Start with prior odds that depend on the match type alone. An exact match is overwhelmingly likely (~99:1). A fuzzy subsequence is a long shot (~1:3).
- Each observed clue — a word-boundary hit, a tag match, a small position, a tight gap — is a Bayes factor (likelihood ratio). Each factor multiplies the posterior odds.
- The final log-posterior is the sort key. The list of
(description, log BF)pairs is the ranking explanation.
The palette is a probabilistic classifier in disguise. The user types characters; we update a posterior over “which command did you mean?” and show the argmax-k with their evidence ledgers attached.
The math
Odds form of Bayes’ rule for relevance given evidence :
Taking logs turns the product into a sum — numerically stable and trivially auditable:
Prior odds by match type
| Match type | Prior odds | Intuition | |
|---|---|---|---|
Exact | 99:1 | 0.99 | User typed the full command name. |
Prefix | 9:1 | 0.90 | Stem of the name. |
WordStart | 4:1 | 0.80 | Lines up with a word boundary. |
Substring | 2:1 | 0.67 | Contiguous inside the name. |
Fuzzy | 1:3 | 0.25 | Subsequence, nothing more. |
Evidence factors (illustrative)
Each factor is computed once per candidate per keystroke. Typical clues:
- Word-boundary hit — per boundary hit.
- Early position — .
- Tight gap density — .
- Tag match — when the query also matches a tag.
- Recent use — for commands fired in the last session.
The signs are principled: clues that make relevance more likely have positive log-BF, clues that erode it are negative, and a missing clue contributes zero (not a made-up penalty).
Worked example — typing pal
Consider three candidates matching pal:
palette.open— WordStart, gap 0, tagpalette.terminal.palette— Substring at position 9, gap 0.unrelated.place— Fuzzy, gaps totalling 4.
The ledger written to the evidence sink looks like (schema trimmed):
{"schema":"match-evidence","id":"palette.open","match_type":"WordStart",
"log_prior":1.386,"entries":[
{"desc":"boundary hit", "log_bf":+0.70},
{"desc":"position=0", "log_bf":+0.30},
{"desc":"tag 'palette'", "log_bf":+1.00}],
"log_posterior":3.386,"rank":1}
{"schema":"match-evidence","id":"terminal.palette","match_type":"Substring",
"log_prior":0.693,"entries":[
{"desc":"position=9", "log_bf":-0.20}],
"log_posterior":0.493,"rank":2}
{"schema":"match-evidence","id":"unrelated.place","match_type":"Fuzzy",
"log_prior":-1.099,"entries":[
{"desc":"gap_sum=4", "log_bf":-0.80}],
"log_posterior":-1.899,"rank":3}Reading the ledger: the first two clues on palette.open push the
posterior past a Substring match with no bonuses, and the tag match
turns it into a clean lead. The sort key is the final log posterior,
but you can explain the lead to a user with one glance at the
entries array.
Rust interface
use ftui_widgets::command_palette::{EvidenceEntry, EvidenceLedger, MatchType};
let mut ledger = EvidenceLedger::new(MatchType::WordStart);
ledger.add(EvidenceEntry::new("boundary hit", 0.70));
ledger.add(EvidenceEntry::new("position=0", 0.30));
ledger.add(EvidenceEntry::new("tag 'palette'", 1.00));
let log_posterior = ledger.log_posterior(); // = log_prior + Σ log_bf
let p_relevant = ledger.posterior(); // sigmoid(log_posterior)Priors come from the match type:
// MatchType::prior_odds(self) -> f64
let prior_odds = MatchType::WordStart.prior_odds(); // 4.0
let log_prior = prior_odds.ln(); // ≈ 1.386Every EvidenceEntry carries a human-readable description, so the
ledger doubles as the “why is this ranked here?” explanation. The
palette widget surfaces this through
/widgets/command-palette — a debug
overlay renders the top-k entries with their log-BF bars.
How to debug
Enable the evidence sink and filter to match-evidence:
FTUI_EVIDENCE_SINK=/tmp/ftui.jsonl cargo run -p ftui-demo-showcase
# Ledgers for the last search:
jq -c 'select(.schema=="match-evidence")' /tmp/ftui.jsonl | tail -20Pinpointing a surprise: if a command should be ranked first but isn’t, the ledger tells you whether the prior was wrong (match-type misclassified) or a specific Bayes factor was missing:
# Find candidates where the tag clue was skipped:
jq -c 'select(.schema=="match-evidence" and (
[.entries[] | .desc] | contains(["tag"]) | not
))' /tmp/ftui.jsonlPitfalls
Don’t inflate priors to paper over weak evidence. If Fuzzy
candidates keep winning, raising Fuzzy’s prior from 1:3 to 1:1
will also let them beat real matches during noisy typing. Instead,
add the missing evidence factor (e.g., a penalty for long gap runs)
— the ledger stays honest and the posterior calibrates itself.
Independence is an approximation. The Bayes-factor product assumes clues are conditionally independent given . They are not: a word-boundary hit at position 0 correlates with a Prefix match type. The palette compensates by keeping factor magnitudes small (mostly ) so double-counting is bounded.
Cross-references
- Rank confidence — the conformal layer that breaks ties deterministically.
/widgets/command-palette— the consuming widget’s API and UX./runtime/evidence-sink— howmatch-evidencelines are emitted and rotated.