Don't Trust.
Verify.
Run lean-ctx benchmark run in any project. Real token counts. Real accuracy metrics. Measured with tiktoken (o200k_base).
Measured. Verified.
Benchmark runs locally, counts tokens with the exact tokenizer, and rejects compressions that drop below the quality bar.
Exact token count
Counts with the same tokenizer used by modern LLMs - no estimates, no guesswork.
tiktoken o200k_base Quality guard
Scores AST preservation, identifiers, and line structure. Failing outputs are blocked automatically.
threshold: Q โฅ 95% ยท ฯ โฅ 15% Reproducible
Runs on your repo. Same inputs โ same numbers. Great for CI and regressions.
offline ยท deterministic Before & After
The same file. The same information. Dramatically fewer tokens.
88% fewer tokens
How It Works
Point at any file or directory
Pass a single file, a directory, or a glob pattern. The benchmark engine processes everything it finds.
lean-ctx benchmark run src/ Exact token measurement
Uses tiktoken with the o200k_base encoding (same as GPT-4o, Claude, and modern LLMs). No estimates - real token counts.
tiktoken o200k_base Savings per mode
Get accuracy scores and savings percentages for every compression mode. Pick the right mode for each use case.
modes: 10 Benchmark in Action
Run the benchmark on any file in your project. The output shows exact token counts for each compression mode, savings percentage, and quality preservation scores.
Per-file breakdown - tokens before and after each mode
Quality scores - AST, identifiers, and code lines preserved
Aggregated totals - directory-wide savings with best mode recommendation
$ lean-ctx benchmark run src/auth.ts
โ lean-ctx Benchmark
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
src/auth.ts (123 lines, 3,517 tokens)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Mode Tokens Saved Rate
full 3,517 0 0%
map 412 3,105 88%
signatures 252 3,265 93%
diff 187 3,330 95%
aggressive 298 3,219 92%
entropy 312 3,205 91%
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Quality: AST 98% | Idents 97% | Lines 96%
Encoding: tiktoken o200k_base | Time: 12ms
Read Modes Compared
full 0% Files you will edit
Everything - full content cached for re-reads at ~13 tokens
map 70-90% Context-only files
Dependency graph, exports, key signatures
signatures 55โ93% API surface exploration
Function/class/type signatures only
diff 80โ95% After edits
Changed lines with minimal surrounding context
aggressive 75โ90% Large boilerplate files
Structure and logic, syntax stripped
entropy 70โ83% Noisy files (JSDoc, comments)
High-entropy lines only (Shannon + Jaccard filtering)
task 65โ85% Task-focused reads (e.g. 'fix auth bug')
Task-relevant code + dependency context via Knowledge Graph + IB filter
auto 70โ99% Default - lean-ctx picks the best mode automatically
Adapts per file: type, size bucket, recency, task relevance
reference 80โ95% API docs and reference lookup
Public API, types, signatures, docstrings
lines:N-M 90โ99% Read a specific line range - surgical precision
Exact lines requested, plus minimal surrounding context
lean-ctx's ctx_smart_read automatically picks the optimal mode using Bayesian prediction based on file type, size, and context.
Advanced Compression Pipeline
Beyond mode selection, lean-ctx applies a multi-stage optimization pipeline that adapts to file type, session context, and task intent:
Learns optimal compression thresholds per file type using multi-armed bandit exploration (explore vs exploit)
Language-aware pruning via Tree-sitter - removes function bodies, comments, and boilerplate while preserving API signatures
Cross-file deduplication using inverse document frequency - eliminates content already seen in the session
Task-aware filtering using the Information Bottleneck principle - keeps only content relevant to the current task
Collapses repetitive structures (imports, log lines, boilerplate) into counted summaries
These stages are cumulative - applied in sequence, they can reduce a 1000-line file to under 50 tokens while preserving all task-relevant information. The pipeline is fully automatic and requires no configuration.
Compression Quality
Quality threshold (composite)
Compressed output is only used if the composite quality score stays at or above 95%.
Minimum density
Blocks low-information output with a minimum signal density of 15% (ฯ).
Weighting
Composite = AST 50% + identifiers 30% + lines 20% - so structure matters most.
Why Fewer Tokens = Higher Signal Density
LLMs have a fixed attention budget. Every token in the context window competes for attention weights. Filling the window with boilerplate means less attention on the code that matters.
By removing noise before it reaches the model, lean-ctx increases the information density of every request. The result: higher signal-to-noise ratio, less context dilution, and the model stays within useful context limits.
10K tokens of focused context outperform 200K of boilerplate. The model spends its attention on logic, not on JSDoc comments and import boilerplate.
Context noise dilutes the model's attention window. Removing it helps the model stay grounded in actual code structure and reduces the chance of hallucination.
Fewer input tokens means lower API costs and more messages within your rate limit. The same quota goes further - for every AI tool you use.
Measured on Real Code
Representative snapshots - your numbers will vary by file and codebase.
450 lines - map mode
12,840 โ 1,541 820 lines - signatures mode
18,290 โ 1,280 1,200 lines - aggressive mode
31,500 โ 2,835 680 lines - entropy mode
15,400 โ 2,618 340 lines - diff mode
8,750 โ 437 Benchmark
Methodology
Every number on this page is reproducible. Here's exactly how we measure.
Tokenizer
All token counts use tiktoken with the o200k_base encoding โ the same
tokenizer used by GPT-4o, Claude, and modern LLMs. No estimates or approximations.
Quality Threshold
Compressed output is only used if the composite quality score stays at or above 95%. Composite = AST preservation (50%) + identifier preservation (30%) + line coverage (20%).
Reproduce Locally
Run lean-ctx benchmark run src/ on your own codebase. The output shows exact token counts
for each compression mode, savings percentage, and quality preservation scores.
Disclaimer
Results vary by file type, size, language, and read mode. The "60-99%" range reflects real-world variance: small structured files compress more, large unstructured files compress less. Cached re-reads (~13 tokens) represent the best case.
Measure your actual savings.
Install lean-ctx and run benchmark run on your codebase. Real numbers, your files, your savings.
lean-ctx benchmark run src/ Works on any codebase. No config needed. Results in seconds.