Documentation

Scientific Intelligence Layer

Six algorithms grounded in information theory, graph theory, and statistical mechanics power the context layer - filtering, prioritizing, and routing information automatically.

lean-ctx goes beyond pattern matching. Six algorithms from information theory, graph theory, and statistical mechanics work together to decide what to keep, what to cut, and where to put it - all automatically, all locally, zero configuration.

These algorithms power the autonomous intelligence layer. They activate during every ctx_read, ctx_preload, and ctx_overview call - you never invoke them directly.

How the layers interact

AlgorithmDecidesBased on
Spectral RelevanceWhich files matterDependency graph + PageRank
Boltzmann AllocationHow many tokens per fileTask specificity + relevance score
Predictive SurpriseWhich lines to keepCross-entropy of BPE tokens
MMR DeduplicationWhich lines are redundantBigram Jaccard similarity
Semantic ChunkingHow to order the outputAST boundaries + attention flow
BPE OptimizationHow to encode the final textToken-level compression rules

Predictive Surprise Scoring

Traditional compression filters use Shannon entropy on raw characters, which treats aaaa and import as equally predictable. Predictive Surprise instead measures cross-entropy against BPE token frequency - how surprising each line is to the LLM's tokenizer.

How it works

  1. Each line is tokenized using o200k_base (GPT-4o / Claude tokenizer).
  2. For each token, a Zipfian prior estimates expected frequency based on rank.
  3. The cross-entropy is computed: lines with common tokens (boilerplate) score low; lines with rare tokens (complex logic) score high.
  4. Lines below the surprise threshold are candidates for removal during aggressive compression.

Example

// Low surprise (boilerplate) - candidate for removal:
return Ok(());                    // surprise: 0.12
use std::collections::HashMap;   // surprise: 0.18

// High surprise (unique logic) - always preserved:
let decay = (-alpha * dist as f64).exp();   // surprise: 0.89
scores[j] += heat[i] * decay / degree;      // surprise: 0.94

Why it matters

Predictive Surprise removes 15–30% more boilerplate than character-level entropy while preserving complex logic. It directly models what the LLM considers "informative" because it uses the same BPE vocabulary.


Spectral Relevance Propagation

When you ask lean-ctx to preload context for a task, it needs to decide which files matter. Simple keyword matching only finds files that mention your search terms. Spectral Relevance finds files that are structurally connected to the relevant ones - even if they share no keywords.

How it works

Two graph algorithms run on the project's dependency graph:

  1. Heat Diffusion - Seed files matching the task description receive initial "heat". Heat spreads along import edges with exponential decay, simulating information flow through the codebase.
  2. PageRank Centrality - Files imported by many other files receive higher centrality scores. This identifies structural hubs (e.g., a db.ts imported by 20 modules).

The final relevance score combines both: 0.7 × heat + 0.3 × pagerank. Files below a threshold are excluded from preloading.

Example

Task: "fix authentication bug"

Direct matches:       auth.ts, login.ts
Heat diffusion adds:  middleware.ts (imports auth.ts)
                      session.ts (imported by auth.ts)
PageRank promotes:    db.ts (imported by 18 files - structural hub)

Result: 5 files preloaded instead of 2, covering the full blast radius.

Zero configuration

The dependency graph builds automatically on first use via load_or_build(). No ctx_graph build needed - it just works.


Boltzmann Context Allocation

Once Spectral Relevance selects the relevant files, each file needs a token budget. Naive approaches give equal budgets or sort by relevance. Boltzmann Allocation uses statistical mechanics to distribute tokens optimally.

How it works

  1. Each file has a relevance score Ei from Spectral Relevance.
  2. A temperature parameter β is derived from the task: specific tasks ("fix auth bug in login.ts") produce low temperature (sharp allocation to top files); broad tasks ("refactor the codebase") produce high temperature (even distribution).
  3. Token budgets follow the Boltzmann distribution: budgeti = total × (eβ·Ei / Σ eβ·Ej)
  4. Minimum and maximum budgets are enforced (128–4096 tokens per file), and the compression mode (full, map, signatures) is chosen based on the allocated budget.

Example

Task: "fix the JWT validation in auth middleware"
Task specificity: 0.85 (specific) → β = 4.2

File            | Relevance | Budget | Mode
auth.ts         |      0.92 |  3,200 | full
middleware.ts   |      0.78 |  1,800 | full
session.ts      |      0.45 |    420 | map
db.ts           |      0.31 |    180 | signatures
routes.ts       |      0.22 |    128 | signatures
                                ─────
Total:                         5,728 tokens (within 8,000 budget)

Reciprocal Rank Fusion (RRF) Cache Eviction

lean-ctx uses Reciprocal Rank Fusion for cache eviction decisions, replacing the earlier Boltzmann-inspired weighted scoring. RRF handles incomparable signals (time in seconds, frequency as counts, size in tokens) without arbitrary weight tuning.

AspectLegacy (Boltzmann)Current (RRF)
Signal handlingMixed units in one formula (0.4×recency + 0.3×frequency + 0.3×size)Each signal ranked independently, then fused
Weight tuningRequires manual weight calibration (arbitrary 0.4/0.3/0.3)No weights - only K=60 (standard IR parameter)
Edge casesLarge files dominate due to log-scaling of sizeFair treatment - each signal contributes equally via rank

Formula: RRF(d) = Σ 1/(K + ranki(d)) - entries with the lowest fused score are evicted first. This produces monotonically correct ordering regardless of signal magnitude differences.


Semantic Chunking with Attention Bridges

LLMs suffer from the "Lost in the Middle" problem: information at the beginning and end of context receives more attention than content in the middle. Semantic Chunking restructures output to minimize information loss.

How it works

  1. Chunk Detection - Source lines are grouped into semantic chunks (functions, types, imports, loose logic) based on AST boundary heuristics.
  2. Relevance Ordering - Chunks matching the current task are promoted to the top (high-attention position). Remaining chunks are ordered by type priority.
  3. Attention Bridges - Between chunks, lean-ctx inserts minimal bridge markers (---) so the LLM recognizes structural boundaries.
  4. Tail Anchors - The last 2–3 lines of the highest-priority chunk are repeated at the very end of the output, exploiting the recency bias for critical information.

Example

// Without chunking (flat output):
import { db } from '../pages/docs/db';
import { hash } from '../pages/docs/crypto';
const MAX_RETRIES = 3;
export function createUser(...) { ... }
export function validateToken(...) { ... }    ← lost in the middle
export function deleteUser(...) { ... }

// With semantic chunking (task: "fix token validation"):
export function validateToken(...) { ... }    ← promoted to top
---
export function createUser(...) { ... }
export function deleteUser(...) { ... }
---
import { db } from '../pages/docs/db';
const MAX_RETRIES = 3;
---
// anchor: validateToken signature                ← tail anchor

MMR Deduplication

When multiple files are loaded into context, they often contain duplicate imports, shared boilerplate, or very similar code patterns. Maximum Marginal Relevance (MMR) removes this redundancy while preserving unique information.

How it works

  1. For each line, compute bigram Jaccard similarity against all previously selected lines (the "coverage set").
  2. The MMR score balances relevance against redundancy: MMR(l) = λ × relevance(l) - (1 - λ) × max_similarity(l, selected)
  3. Lines with MMR < 0 are suppressed - they add more redundancy than information. The λ parameter defaults to 0.7 (favoring relevance over diversity).

Impact

In a typical multi-file preload, MMR removes 10–25% of redundant content - primarily shared imports and repeated utility patterns.


BPE-Aligned Token Optimization

After all content is selected and ordered, a final pass optimizes the raw text for the LLM's BPE tokenizer. Small formatting changes can significantly reduce token count without changing semantics.

Optimization rules

BeforeAfterToken savings
function fn 1 token per occurrence
-> ->2 → 1 token
=> => 2 → 1 token
{ }{}2 → 1 token
(4 spaces) (2 spaces)~50% indentation savings
'static 'static lifetimeelided where safe2 tokens per occurrence

How it works

Each rule is a simple string replacement applied line-by-line after all other compression stages. Rules are derived from BPE token boundary analysis - they target patterns where the tokenizer produces unnecessarily many tokens for semantically equivalent text.

Impact

BPE optimization typically saves 3–8% additional tokens on already-compressed output. The savings compound across files and are most significant for verbose languages (TypeScript, Java, C#).


Verifying the impact

Run lean-ctx benchmark run in your project to measure the combined effect of all six algorithms. The benchmark report shows per-file token savings, preservation scores (AST, identifiers, lines), and overall compression ratios.

$ lean-ctx benchmark run
──────────────────────────────────────────
BENCHMARK - /Users/you/project
──────────────────────────────────────────
Files:       143
Total:       285,401 → 42,810 tokens (85% reduction)
Avg/file:    1,997 → 300 tokens

Preservation:
  AST:         98.2%
  Identifiers: 97.4%
  Lines:       96.1%

Mode breakdown:
  full:        68% of files
  map:         22% of files
  signatures:   8% of files
  aggressive:   2% of files

Structured Intent Recognition

lean-ctx classifies every interaction into a StructuredIntent - a slot-based representation containing the task type, confidence, file targets, scope, language hint, urgency, and action verb. This drives all downstream decisions: which compression mode to use, how to route context, and what to prioritize.

9 task types are recognized: Explore, Generate, FixBug, Refactor, Test, Review, Config, Deploy, and Document. Each type triggers different compression strategies and context routing.

IntentScope ranges from SingleFile through MultiFile and CrossModule to ProjectWide - determining how broadly context is gathered.

How it works

Compression adapts per intent: FixBug tasks prioritize error lines and test files via the Information Bottleneck filter, Explore tasks use lightweight cleanup to preserve structure, and Generate tasks focus on signatures and types.

The ctx_read auto-mode selector uses the active task type to refine its decision - choosing task mode for bug fixes on large files, map mode for exploration, and signatures mode for documentation tasks.

Example

Query: "fix the NaN bug in entropy.rs"

StructuredIntent:
  task_type:    FixBug (confidence: 0.95)
  targets:      ["entropy.rs"]
  scope:        SingleFile
  language:     Rust
  urgency:      0.8
  action_verb:  "fix"

→ Compression: Information Bottleneck filter (error lines boosted)
→ Mode:        task (error-focused extraction)
→ Suggestions: tests/entropy_test.rs (deficit detection)

Context Pipeline Architecture

The context pipeline processes information through six distinct layers: InputIntentRelevanceCompressionTranslationDelivery. Each layer has defined contracts (input/output types) and emits metrics.

How it works

Input → Intent → Relevance → Compression → Terse Engine → Delivery
  │        │          │            │              │              │
  │        │          │            │              │              └─ Final output to LLM
  │        │          │            │              └─ 4-layer terse pipeline (see below)
  │        │          │            └─ AST-aware compression per intent
  │        │          └─ Graph heat + relevance scoring
  │        └─ StructuredIntent classification
  └─ Raw file content / shell output

Per-layer metrics track input tokens, output tokens, compression ratio, and processing time. Aggregated across a session, these reveal where the most savings occur and which layers are bottlenecks.

4-Layer Terse Engine

The terse engine applies four composable compression layers, controlled by compression_level (Off / Lite / Standard / Max). Each layer is independently verified by Lean4 proofs (TerseQuality, TerseEngine — part of 82 total theorems).

LayerNameWhat it does
1DictionaryCommon token substitutions and abbreviations (functionfn, returnret)
2ResidualWhitespace normalization, blank-line collapse, boilerplate removal
3ScoringInformation-theoretic ranking — keeps high-surprise blocks, prunes low-entropy filler
4PipelineCEP v1 protocol: delta-only output, structured notation (+/-/~), token budgets

Context Ledger & Pressure Management

The ContextLedger tracks every file sent to the AI: path, compression mode, original tokens, sent tokens, and timestamp. It calculates real-time context window utilization and pressure.

How it works

Three pressure levels: None (under 70% utilization), Compress (70–90%), and Evict (over 90%). At each level, the system takes different actions - downgrading compression modes for less relevant files or suggesting evictions.

Example

Context Window: 128,000 tokens
Loaded: 89,600 tokens (70% utilization)
Pressure: None → safe

After loading 3 more files:
Loaded: 115,200 tokens (90% utilization)
Pressure: Compress → downgrade non-target files

Re-Injection Plan:
  utils.rs:    full → signatures  (save 2,400 tokens)
  helpers.rs:  full → map         (save 1,800 tokens)
  config.rs:   full → signatures  (save 1,200 tokens)
  Protected:   auth.rs, login.ts  (target files)

Context Deficit Detection identifies missing information: if a bug-fix task targets auth.rs but no test file is loaded, the system suggests tests/auth_test.rs. For config tasks, it recommends Cargo.toml, .env, or Dockerfile.

Smart Re-Injection generates a plan to free context budget by downgrading non-target files (e.g., switching from full to signatures mode) while protecting files critical to the current intent.


Multi-Agent Intelligence

When multiple AI agents collaborate (e.g., a coder and a reviewer), lean-ctx provides structured handoffs, role-based context depth, and cross-agent knowledge sharing.

How it works

HandoffPackage bundles the current session ledger, structured intent, and context snapshot into a single transferable unit - so the receiving agent starts with full situational awareness instead of a blank slate.

Seven agent roles are recognized: Coder, Reviewer, Planner, Explorer, Debugger, Tester, and Orchestrator. Each gets a tailored ContextDepthConfig - a Coder gets more full-file reads, a Reviewer gets more signatures, an Explorer gets broader graph context.

Example

Agent "coder-1" (role: Coder):
  max_files_full: 8
  preferred_mode: full
  context_budget: 80%

Agent "reviewer-1" (role: Reviewer):
  max_files_full: 3
  preferred_mode: signatures
  context_budget: 60%

Shared Knowledge:
  K:discovery:auth_bug = "null check missing in line 42"
  K:decision:fix_approach = "add Option<T> wrapper"
  → reviewer-1 inherits these facts without re-reading files

Cross-Agent Knowledge Sharing lets agents exchange structured facts (e.g., "discovery:auth_bug=null check missing in line 42") via a shared scratchpad. New agents inherit relevant knowledge without re-discovering it.


Community Detection (Louvain)

The core::community module clusters files by their dependency graph using the Louvain algorithm. This groups tightly-coupled files into communities, enabling more intelligent context selection — files in the same community as your target are prioritized for preloading and receive higher relevance scores during task-driven reads.

Code Smell Detection

The ctx_smells tool detects long functions, deep nesting, and high cyclomatic complexity using the core::smells module. Detected smells are scored with graph-enriched weighting — files with high PageRank or many dependents receive amplified smell scores, surfacing maintenance hotspots that have the widest blast radius across the codebase.


See also