lean-ctx goes beyond pattern matching. Six algorithms from information theory, graph theory, and statistical mechanics work together to decide what to keep, what to cut, and where to put it - all automatically, all locally, zero configuration.
These algorithms power the autonomous intelligence layer. They activate during every ctx_read, ctx_preload, and ctx_overview call - you never invoke them directly.
How the layers interact
| Algorithm | Decides | Based on |
|---|---|---|
| Spectral Relevance | Which files matter | Dependency graph + PageRank |
| Boltzmann Allocation | How many tokens per file | Task specificity + relevance score |
| Predictive Surprise | Which lines to keep | Cross-entropy of BPE tokens |
| MMR Deduplication | Which lines are redundant | Bigram Jaccard similarity |
| Semantic Chunking | How to order the output | AST boundaries + attention flow |
| BPE Optimization | How to encode the final text | Token-level compression rules |
Predictive Surprise Scoring
Traditional compression filters use Shannon entropy on raw characters, which treats aaaa and import as equally predictable. Predictive Surprise instead measures cross-entropy against BPE token frequency - how surprising each line is to the LLM's tokenizer.
How it works
- Each line is tokenized using
o200k_base(GPT-4o / Claude tokenizer). - For each token, a Zipfian prior estimates expected frequency based on rank.
- The cross-entropy is computed: lines with common tokens (boilerplate) score low; lines with rare tokens (complex logic) score high.
- Lines below the surprise threshold are candidates for removal during aggressive compression.
Example
// Low surprise (boilerplate) - candidate for removal:
return Ok(()); // surprise: 0.12
use std::collections::HashMap; // surprise: 0.18
// High surprise (unique logic) - always preserved:
let decay = (-alpha * dist as f64).exp(); // surprise: 0.89
scores[j] += heat[i] * decay / degree; // surprise: 0.94 Why it matters
Predictive Surprise removes 15–30% more boilerplate than character-level entropy while preserving complex logic. It directly models what the LLM considers "informative" because it uses the same BPE vocabulary.
Spectral Relevance Propagation
When you ask lean-ctx to preload context for a task, it needs to decide which files matter. Simple keyword matching only finds files that mention your search terms. Spectral Relevance finds files that are structurally connected to the relevant ones - even if they share no keywords.
How it works
Two graph algorithms run on the project's dependency graph:
- Heat Diffusion - Seed files matching the task description receive initial "heat". Heat spreads along import edges with exponential decay, simulating information flow through the codebase.
- PageRank Centrality - Files imported by many other files receive higher centrality scores. This identifies structural hubs (e.g., a
db.tsimported by 20 modules).
The final relevance score combines both: 0.7 × heat + 0.3 × pagerank. Files below a threshold are excluded from preloading.
Example
Task: "fix authentication bug"
Direct matches: auth.ts, login.ts
Heat diffusion adds: middleware.ts (imports auth.ts)
session.ts (imported by auth.ts)
PageRank promotes: db.ts (imported by 18 files - structural hub)
Result: 5 files preloaded instead of 2, covering the full blast radius. Zero configuration
The dependency graph builds automatically on first use via load_or_build(). No ctx_graph build needed - it just works.
Boltzmann Context Allocation
Once Spectral Relevance selects the relevant files, each file needs a token budget. Naive approaches give equal budgets or sort by relevance. Boltzmann Allocation uses statistical mechanics to distribute tokens optimally.
How it works
- Each file has a relevance score Ei from Spectral Relevance.
- A temperature parameter β is derived from the task: specific tasks ("fix auth bug in login.ts") produce low temperature (sharp allocation to top files); broad tasks ("refactor the codebase") produce high temperature (even distribution).
- Token budgets follow the Boltzmann distribution:
budgeti = total × (eβ·Ei / Σ eβ·Ej) - Minimum and maximum budgets are enforced (128–4096 tokens per file), and the compression mode (
full,map,signatures) is chosen based on the allocated budget.
Example
Task: "fix the JWT validation in auth middleware"
Task specificity: 0.85 (specific) → β = 4.2
File | Relevance | Budget | Mode
auth.ts | 0.92 | 3,200 | full
middleware.ts | 0.78 | 1,800 | full
session.ts | 0.45 | 420 | map
db.ts | 0.31 | 180 | signatures
routes.ts | 0.22 | 128 | signatures
─────
Total: 5,728 tokens (within 8,000 budget) Reciprocal Rank Fusion (RRF) Cache Eviction
lean-ctx uses Reciprocal Rank Fusion for cache eviction decisions, replacing the earlier Boltzmann-inspired weighted scoring. RRF handles incomparable signals (time in seconds, frequency as counts, size in tokens) without arbitrary weight tuning.
| Aspect | Legacy (Boltzmann) | Current (RRF) |
|---|---|---|
| Signal handling | Mixed units in one formula (0.4×recency + 0.3×frequency + 0.3×size) | Each signal ranked independently, then fused |
| Weight tuning | Requires manual weight calibration (arbitrary 0.4/0.3/0.3) | No weights - only K=60 (standard IR parameter) |
| Edge cases | Large files dominate due to log-scaling of size | Fair treatment - each signal contributes equally via rank |
Formula: RRF(d) = Σ 1/(K + ranki(d)) - entries with the lowest fused score are evicted first. This produces monotonically correct ordering regardless of signal magnitude differences.
Semantic Chunking with Attention Bridges
LLMs suffer from the "Lost in the Middle" problem: information at the beginning and end of context receives more attention than content in the middle. Semantic Chunking restructures output to minimize information loss.
How it works
- Chunk Detection - Source lines are grouped into semantic chunks (functions, types, imports, loose logic) based on AST boundary heuristics.
- Relevance Ordering - Chunks matching the current task are promoted to the top (high-attention position). Remaining chunks are ordered by type priority.
- Attention Bridges - Between chunks, lean-ctx inserts minimal bridge markers (
---) so the LLM recognizes structural boundaries. - Tail Anchors - The last 2–3 lines of the highest-priority chunk are repeated at the very end of the output, exploiting the recency bias for critical information.
Example
// Without chunking (flat output):
import { db } from '../pages/docs/db';
import { hash } from '../pages/docs/crypto';
const MAX_RETRIES = 3;
export function createUser(...) { ... }
export function validateToken(...) { ... } ← lost in the middle
export function deleteUser(...) { ... }
// With semantic chunking (task: "fix token validation"):
export function validateToken(...) { ... } ← promoted to top
---
export function createUser(...) { ... }
export function deleteUser(...) { ... }
---
import { db } from '../pages/docs/db';
const MAX_RETRIES = 3;
---
// anchor: validateToken signature ← tail anchor MMR Deduplication
When multiple files are loaded into context, they often contain duplicate imports, shared boilerplate, or very similar code patterns. Maximum Marginal Relevance (MMR) removes this redundancy while preserving unique information.
How it works
- For each line, compute bigram Jaccard similarity against all previously selected lines (the "coverage set").
- The MMR score balances relevance against redundancy:
MMR(l) = λ × relevance(l) - (1 - λ) × max_similarity(l, selected) - Lines with
MMR < 0are suppressed - they add more redundancy than information. The λ parameter defaults to 0.7 (favoring relevance over diversity).
Impact
In a typical multi-file preload, MMR removes 10–25% of redundant content - primarily shared imports and repeated utility patterns.
BPE-Aligned Token Optimization
After all content is selected and ordered, a final pass optimizes the raw text for the LLM's BPE tokenizer. Small formatting changes can significantly reduce token count without changing semantics.
Optimization rules
| Before | After | Token savings |
|---|---|---|
function | fn | 1 token per occurrence |
-> | -> | 2 → 1 token |
=> | => | 2 → 1 token |
{ } | {} | 2 → 1 token |
(4 spaces) | (2 spaces) | ~50% indentation savings |
'static 'static lifetime | elided where safe | 2 tokens per occurrence |
How it works
Each rule is a simple string replacement applied line-by-line after all other compression stages. Rules are derived from BPE token boundary analysis - they target patterns where the tokenizer produces unnecessarily many tokens for semantically equivalent text.
Impact
BPE optimization typically saves 3–8% additional tokens on already-compressed output. The savings compound across files and are most significant for verbose languages (TypeScript, Java, C#).
Verifying the impact
Run lean-ctx benchmark run in your project to measure the combined effect of all six algorithms. The benchmark report shows per-file token savings, preservation scores (AST, identifiers, lines), and overall compression ratios.
$ lean-ctx benchmark run
──────────────────────────────────────────
BENCHMARK - /Users/you/project
──────────────────────────────────────────
Files: 143
Total: 285,401 → 42,810 tokens (85% reduction)
Avg/file: 1,997 → 300 tokens
Preservation:
AST: 98.2%
Identifiers: 97.4%
Lines: 96.1%
Mode breakdown:
full: 68% of files
map: 22% of files
signatures: 8% of files
aggressive: 2% of files Structured Intent Recognition
lean-ctx classifies every interaction into a StructuredIntent - a slot-based representation containing the task type, confidence, file targets, scope, language hint, urgency, and action verb. This drives all downstream decisions: which compression mode to use, how to route context, and what to prioritize.
9 task types are recognized: Explore, Generate, FixBug, Refactor, Test, Review, Config, Deploy, and Document. Each type triggers different compression strategies and context routing.
IntentScope ranges from SingleFile through MultiFile and CrossModule to ProjectWide - determining how broadly context is gathered.
How it works
Compression adapts per intent: FixBug tasks prioritize error lines and test files via the Information Bottleneck filter, Explore tasks use lightweight cleanup to preserve structure, and Generate tasks focus on signatures and types.
The ctx_read auto-mode selector uses the active task type to refine its decision - choosing task mode for bug fixes on large files, map mode for exploration, and signatures mode for documentation tasks.
Example
Query: "fix the NaN bug in entropy.rs"
StructuredIntent:
task_type: FixBug (confidence: 0.95)
targets: ["entropy.rs"]
scope: SingleFile
language: Rust
urgency: 0.8
action_verb: "fix"
→ Compression: Information Bottleneck filter (error lines boosted)
→ Mode: task (error-focused extraction)
→ Suggestions: tests/entropy_test.rs (deficit detection) Context Pipeline Architecture
The context pipeline processes information through six distinct layers: Input → Intent → Relevance → Compression → Translation → Delivery. Each layer has defined contracts (input/output types) and emits metrics.
How it works
Input → Intent → Relevance → Compression → Terse Engine → Delivery
│ │ │ │ │ │
│ │ │ │ │ └─ Final output to LLM
│ │ │ │ └─ 4-layer terse pipeline (see below)
│ │ │ └─ AST-aware compression per intent
│ │ └─ Graph heat + relevance scoring
│ └─ StructuredIntent classification
└─ Raw file content / shell output Per-layer metrics track input tokens, output tokens, compression ratio, and processing time. Aggregated across a session, these reveal where the most savings occur and which layers are bottlenecks.
4-Layer Terse Engine
The terse engine applies four composable compression layers, controlled by
compression_level
(Off / Lite / Standard / Max). Each layer is independently verified by Lean4 proofs
(TerseQuality, TerseEngine — part of 82 total theorems).
| Layer | Name | What it does |
|---|---|---|
| 1 | Dictionary | Common token substitutions and abbreviations (function → fn, return → ret) |
| 2 | Residual | Whitespace normalization, blank-line collapse, boilerplate removal |
| 3 | Scoring | Information-theoretic ranking — keeps high-surprise blocks, prunes low-entropy filler |
| 4 | Pipeline | CEP v1 protocol: delta-only output, structured notation (+/-/~), token budgets |
Context Ledger & Pressure Management
The ContextLedger tracks every file sent to the AI: path, compression mode, original tokens, sent tokens, and timestamp. It calculates real-time context window utilization and pressure.
How it works
Three pressure levels: None (under 70% utilization), Compress (70–90%), and Evict (over 90%). At each level, the system takes different actions - downgrading compression modes for less relevant files or suggesting evictions.
Example
Context Window: 128,000 tokens
Loaded: 89,600 tokens (70% utilization)
Pressure: None → safe
After loading 3 more files:
Loaded: 115,200 tokens (90% utilization)
Pressure: Compress → downgrade non-target files
Re-Injection Plan:
utils.rs: full → signatures (save 2,400 tokens)
helpers.rs: full → map (save 1,800 tokens)
config.rs: full → signatures (save 1,200 tokens)
Protected: auth.rs, login.ts (target files) Context Deficit Detection identifies missing information: if a bug-fix task targets auth.rs but no test file is loaded, the system suggests tests/auth_test.rs. For config tasks, it recommends Cargo.toml, .env, or Dockerfile.
Smart Re-Injection generates a plan to free context budget by downgrading non-target files (e.g., switching from full to signatures mode) while protecting files critical to the current intent.
Multi-Agent Intelligence
When multiple AI agents collaborate (e.g., a coder and a reviewer), lean-ctx provides structured handoffs, role-based context depth, and cross-agent knowledge sharing.
How it works
HandoffPackage bundles the current session ledger, structured intent, and context snapshot into a single transferable unit - so the receiving agent starts with full situational awareness instead of a blank slate.
Seven agent roles are recognized: Coder, Reviewer, Planner, Explorer, Debugger, Tester, and Orchestrator. Each gets a tailored ContextDepthConfig - a Coder gets more full-file reads, a Reviewer gets more signatures, an Explorer gets broader graph context.
Example
Agent "coder-1" (role: Coder):
max_files_full: 8
preferred_mode: full
context_budget: 80%
Agent "reviewer-1" (role: Reviewer):
max_files_full: 3
preferred_mode: signatures
context_budget: 60%
Shared Knowledge:
K:discovery:auth_bug = "null check missing in line 42"
K:decision:fix_approach = "add Option<T> wrapper"
→ reviewer-1 inherits these facts without re-reading files Cross-Agent Knowledge Sharing lets agents exchange structured facts (e.g., "discovery:auth_bug=null check missing in line 42") via a shared scratchpad. New agents inherit relevant knowledge without re-discovering it.
Community Detection (Louvain)
The core::community module clusters files by their dependency graph using the
Louvain algorithm. This groups tightly-coupled files into communities, enabling
more intelligent context selection — files in the same community as your target are prioritized
for preloading and receive higher relevance scores during task-driven reads.
Code Smell Detection
The ctx_smells tool detects long functions, deep nesting, and high cyclomatic complexity
using the core::smells module. Detected smells are scored with graph-enriched weighting —
files with high PageRank or many dependents receive amplified smell scores, surfacing maintenance
hotspots that have the widest blast radius across the codebase.