Benchmark

Don't Trust.
Verify.

Run lean-ctx benchmark run in any project. Real token counts. Real accuracy metrics. Measured with tiktoken (o200k_base).

Get Started Star on GitHub

How it stays honest

Measured. Verified.

Benchmark runs locally, counts tokens with the exact tokenizer, and rejects compressions that drop below the quality bar.

Exact token count

Counts with the same tokenizer used by modern LLMs - no estimates, no guesswork.

tiktoken o200k_base

Quality guard

Scores AST preservation, identifiers, and line structure. Failing outputs are blocked automatically.

threshold: Q ≥ 95% · ρ ≥ 15%

Reproducible

Runs on your repo. Same inputs → same numbers. Great for CI and regressions.

offline · deterministic

See the difference

Before & After

The same file. The same information. Dramatically fewer tokens.

Without lean-ctx

// src/auth.ts · mode=full

import { jwt, verify, sign } from 'jsonwebtoken';

import { bcrypt } from 'bcryptjs';

…

3,517 tokens

With lean-ctx (map mode)

// src/auth.ts · mode=map

exports: AuthService, validateToken, …

deps: jsonwebtoken, bcryptjs, ioredis

…

412 tokens

88% fewer tokens

Three steps to verified savings

How It Works

Point at any file or directory

Pass a single file, a directory, or a glob pattern. The benchmark engine processes everything it finds.

lean-ctx benchmark run src/

Exact token measurement

Uses tiktoken with the o200k_base encoding (same as GPT-4o, Claude, and modern LLMs). No estimates - real token counts.

tiktoken o200k_base

Savings per mode

Get accuracy scores and savings percentages for every compression mode. Pick the right mode for each use case.

modes: 10

Real output

Benchmark in Action

Run the benchmark on any file in your project. The output shows exact token counts for each compression mode, savings percentage, and quality preservation scores.

Per-file breakdown - tokens before and after each mode

Quality scores - AST, identifiers, and code lines preserved

Aggregated totals - directory-wide savings with best mode recommendation

lean-ctx benchmark run

$ lean-ctx benchmark run src/auth.ts

◆ lean-ctx Benchmark

────────────────────────────────────────

src/auth.ts (123 lines, 3,517 tokens)

────────────────────────────────────────

Mode Tokens Saved Rate

full 3,517 0 0%

map 412 3,105 88%

signatures 252 3,265 93%

diff 187 3,330 95%

aggressive 298 3,219 92%

entropy 312 3,205 91%

────────────────────────────────────────

Quality: AST 98% | Idents 97% | Lines 96%

Encoding: tiktoken o200k_base | Time: 12ms

Pick the right mode for each task

Read Modes Compared

full 0%

Files you will edit

Everything - full content cached for re-reads at ~13 tokens

map 70-90%

Context-only files

Dependency graph, exports, key signatures

signatures 55–93%

API surface exploration

Function/class/type signatures only

diff 80–95%

After edits

Changed lines with minimal surrounding context

aggressive 75–90%

Large boilerplate files

Structure and logic, syntax stripped

entropy 70–83%

Noisy files (JSDoc, comments)

High-entropy lines only (Shannon + Jaccard filtering)

task 65–85%

Task-focused reads (e.g. 'fix auth bug')

Task-relevant code + dependency context via Knowledge Graph + IB filter

auto 70–99%

Default - lean-ctx picks the best mode automatically

Adapts per file: type, size bucket, recency, task relevance

reference 80–95%

API docs and reference lookup

Public API, types, signatures, docstrings

lines:N-M 90–99%

Read a specific line range - surgical precision

Exact lines requested, plus minimal surrounding context

lean-ctx's ctx_smart_read automatically picks the optimal mode using Bayesian prediction based on file type, size, and context.

Stage

Advanced Compression Pipeline

Beyond mode selection, lean-ctx applies a multi-stage optimization pipeline that adapts to file type, session context, and task intent:

Thompson Sampling 5–15%

Learns optimal compression thresholds per file type using multi-armed bandit exploration (explore vs exploit)

AST Pruning 40–70%

Language-aware pruning via Tree-sitter - removes function bodies, comments, and boilerplate while preserving API signatures

IDF Dedup 10–30%

Cross-file deduplication using inverse document frequency - eliminates content already seen in the session

IB Filter 15–25%

Task-aware filtering using the Information Bottleneck principle - keeps only content relevant to the current task

Verbatim Compaction 5–20%

Collapses repetitive structures (imports, log lines, boilerplate) into counted summaries

These stages are cumulative - applied in sequence, they can reduce a 1000-line file to under 50 tokens while preserving all task-relevant information. The pipeline is fully automatic and requires no configuration.

Verified preservation

Compression Quality

Quality threshold (composite)

95%

Compressed output is only used if the composite quality score stays at or above 95%.

Minimum density

15%

Blocks low-information output with a minimum signal density of 15% (ρ).

Weighting

50/30/20

Composite = AST 50% + identifiers 30% + lines 20% - so structure matters most.

Information density principle

Why Fewer Tokens = Higher Signal Density

LLMs have a fixed attention budget. Every token in the context window competes for attention weights. Filling the window with boilerplate means less attention on the code that matters.

By removing noise before it reaches the model, lean-ctx increases the information density of every request. The result: higher signal-to-noise ratio, less context dilution, and the model stays within useful context limits.

Higher signal-to-noise ratio

10K tokens of focused context outperform 200K of boilerplate. The model spends its attention on logic, not on JSDoc comments and import boilerplate.

Reduced context noise

Context noise dilutes the model's attention window. Removing it helps the model stay grounded in actual code structure and reduces the chance of hallucination.

Lower cost per answer

Fewer input tokens means lower API costs and more messages within your rate limit. The same quota goes further - for every AI tool you use.

Real-world examples

Measured on Real Code

Representative snapshots - your numbers will vary by file and codebase.

React Component 88%

450 lines - map mode

12,840 → 1,541

Rust Module 93%

820 lines - signatures mode

18,290 → 1,280

Express API 91%

1,200 lines - aggressive mode

31,500 → 2,835

Python ML Pipeline 83%

680 lines - entropy mode

15,400 → 2,618

TypeScript Config 95%

340 lines - diff mode

8,750 → 437

Transparency

Benchmark
Methodology

Every number on this page is reproducible. Here's exactly how we measure.

Tokenizer

All token counts use tiktoken with the o200k_base encoding — the same tokenizer used by GPT-4o, Claude, and modern LLMs. No estimates or approximations.

Quality Threshold

Compressed output is only used if the composite quality score stays at or above 95%. Composite = AST preservation (50%) + identifier preservation (30%) + line coverage (20%).

Reproduce Locally

Run lean-ctx benchmark run src/ on your own codebase. The output shows exact token counts for each compression mode, savings percentage, and quality preservation scores.

Disclaimer

Results vary by file type, size, language, and read mode. The "60-99%" range reflects real-world variance: small structured files compress more, large unstructured files compress less. Cached re-reads (~13 tokens) represent the best case.

Measure your actual savings.

Install lean-ctx and run benchmark run on your codebase. Real numbers, your files, your savings.

lean-ctx benchmark run src/

Works on any codebase. No config needed. Results in seconds.

Get Started Star on GitHub

Don't Trust.Verify.

Measured. Verified.

Exact token count

Quality guard

Reproducible

Before & After

How It Works

Point at any file or directory

Exact token measurement

Savings per mode

Benchmark in Action

Read Modes Compared

Advanced Compression Pipeline

Compression Quality

Why Fewer Tokens = Higher Signal Density

Measured on Real Code

BenchmarkMethodology

Tokenizer

Quality Threshold

Reproduce Locally

Disclaimer

Measure your actual savings.

Don't Trust.
Verify.

Benchmark
Methodology