Text Shaping
Shaping is the step that converts text into positioned glyphs. For
most ASCII content in a monospace terminal, shaping is almost the
identity function. For scripts with contextual forms (Arabic), ligatures
(fi, ffi), cursive joining, or mark-to-base attachments (Thai,
Devanagari), shaping is the difference between legible text and
visual garbage.
ftui_text::shaping handles it in two layers: a segmentation layer
(splits text into monodirectional, monoscript runs) and a shaping
layer (turns each run into glyphs, with a cache in front).
The pipeline, zoomed in
BidiSegment (from bidi.rs)
│
▼
┌──────────────────────────────┐
│ script_segmentation.rs │ per-script runs
│ → Vec<TextRun> │
└──────────────────────────────┘
│
▼
┌──────────────────────────────┐
│ ShapingKey │ text + script + direction +
│ (hashes the whole run) │ style + font + size + features
└──────────────────────────────┘
│
▼ cache lookup
┌──────────────────────────────┐
│ ShapingCache (LRU, 1024) │
│ hit → ShapedRun │
└──────────────────────────────┘
│ miss
▼
┌──────────────────────────────┐
│ TextShaper trait │
│ NoopShaper (ASCII/monospace│
│ fallback) │
│ RustybuzzShaper (full │
│ OpenType via rustybuzz) │
└──────────────────────────────┘
│
▼
ShapedRun { glyphs, clusters, advances }TextRun — the unit of shaping
Script segmentation produces TextRuns. Each run has:
pub struct TextRun {
pub text: String,
pub script: Script, // Latin, Greek, Cyrillic, Arabic, Hebrew,
// Thai, CJK, …
pub direction: RunDirection, // Ltr | Rtl
pub start: usize, // logical start in the parent string
pub end: usize,
}Runs are monoscript by construction. The shaper gets to assume every character in the run uses the same shaping rules.
ShapingKey — the cache key that makes hits cheap
A shaping cache hit requires identical inputs. The key packs all of them:
ShapingKey {
text: (hashed)
script: Script
direction: RunDirection
style: Style // bold/italic affects glyph choice
font: FontIdent
size: u16 // in 8ths of a cell or similar
features: SmallVec<[FeatureTag; 4]> // OpenType tags
}Two identical keys → same shaped output. That’s the whole contract of the cache.
OpenType features — why b"liga"
FontFeatures is a small vec of 4-byte OpenType tags:
b"liga" // standard ligatures (fi, fl)
b"kern" // kerning
b"calt" // contextual alternates
b"ss01" // stylistic set 01
b"liga" + b"calt" // programmer-font ligatures (->, =>)You enable features per-shaping-pass. Disabling liga on a
code-editing widget is the conventional choice; enabling it on a
reading widget is the other conventional choice.
The LRU cache
ShapingCache {
capacity: 1024 (default)
entries: LruCache<ShapingKey, ShapedRun>
generation: u32 // bumps when fonts change
}Three properties matter:
- Bounded. 1024 entries is enough for any realistic UI — a terminal running a code editor rarely sees more unique shaping keys per frame than fit in the cache.
- O(1). LRU lookup, insert, and eviction are all O(1).
- Generation-based invalidation. When a font changes (DPR change, zoom, theme swap), the cache bumps its generation counter. Entries older than the current generation are treated as misses and lazily evicted on their next access. No walk-the-whole-cache clear is needed.
Two shapers, one trait
pub trait TextShaper {
fn shape(&self, run: &TextRun, key: &ShapingKey) -> ShapedRun;
}NoopShaper— the ASCII / fixed-pitch fallback. One glyph per char, no ligatures, no contextual forms. Used whenrustybuzzis disabled or for pure-ASCII fast paths.RustybuzzShaper— wrapsrustybuzzfor real OpenType shaping. Used for complex scripts and when ligatures are enabled.
The renderer picks the shaper per-run based on the run’s script: Arabic runs get the full shaper, Latin runs with ligatures off get the noop.
ShapedRun — what comes out
pub struct ShapedRun {
pub glyphs: Vec<Glyph>, // positioned glyphs
pub clusters: Vec<ClusterMap>, // grapheme-to-glyph mapping
/* advances, bounds, etc. */
}ClusterMap is the bridge back to logical text: for each grapheme in
the input run, it records which glyph(s) render it. Essential for
cursor positioning and hit-testing.
Worked example — sketch of the cache path
use ftui_text::shaping::{ShapingCache, ShapingKey, TextShaper};
fn shape_or_cache(
cache: &mut ShapingCache,
shaper: &dyn TextShaper,
run: &TextRun,
) -> ShapedRun {
let key = ShapingKey::for_run(run, /* style, font, size, features */);
if let Some(hit) = cache.get(&key) {
return hit.clone();
}
let shaped = shaper.shape(run, &key);
cache.insert(key, shaped.clone());
shaped
}In steady state (scrolling a code file, typing into a textarea), the vast majority of calls land on the cache hit path.
When do entries actually get recomputed?
The cache generation bumps when:
- The font set loaded into the process changes.
- The DPR / zoom level changes (if the caller wires it into the key).
- The ASCII fast-path opts out (a new grapheme with multiple codepoints appears and forces the real shaper).
In ordinary use, the generation does not bump for seconds at a time, and the hit rate stays very high.
Pitfalls
Don’t put user-typed Unicode directly in the key. Hash it. The cache is keyed by hash, not by string, specifically to keep the key small and the lookup cheap.
Clearing the cache is almost always wrong. Bump the generation
instead. A manual clear() defeats the LRU’s amortized behavior and
causes a cold-cache stall on the next frame.
Shaping is not layout. The shaper tells you glyph positions within a run. It does not wrap lines. Line wrapping lives at the widget layer on top of the width cache; see grapheme / width.