lean-ctx typically saves 74-99% of tokens across a development session. This page explains where those savings come from, how to measure them, and how to maximize them.
Where Savings Come From
Token savings accumulate across 5 layers, each multiplying the effect:
| Layer | Mechanism | Typical Savings |
|---|---|---|
| 1. Session Cache | Re-reads return ~13 token stub instead of full file | ~99% on re-reads |
| 2. Read Modes | map, signatures, entropy etc. compress first reads | 20-95% per first read |
| 3. Shell Compression | 95+ tool patterns strip boilerplate output | 50-95% per command |
| 4. TDD/CRP Mode | Abbreviated notation, symbol maps, diff-only | Additional 20-40% |
| 5. Archive + Expand | Large results stored to disk, retrieved on demand | 80-95% for large outputs |
Compound Effect
In a typical session, the agent reads 15-30 files and runs 10-20 shell commands. With caching eliminating re-reads and compression reducing first reads, the total token savings compound to 74-99% depending on the workload.
Measuring Your Savings
lean-ctx provides several tools to monitor token savings in real-time:
ctx_gain - Session Savings Report
ctx_gain
→ Session gain: 88.4% (34,218 tokens saved)
File reads: 91.2% (28,450 tokens saved, 23 reads, 14 cache hits)
Shell output: 78.3% (5,768 tokens saved, 12 commands)
Total cost saved: ~$0.34 (at Claude Sonnet rates) ctx_metrics - Detailed Breakdown
ctx_metrics
→ Session metrics:
Duration: 12 min
Tool calls: 35
File reads: 23 (14 cache hits, 61% hit rate)
Shell commands: 12
Tokens: 4,018 used / 38,236 without lean-ctx
Savings: 34,218 tokens (89.5%)
Compression ratio: 9.5:1 ctx_benchmark - Per-File Analysis
ctx_benchmark path="src/auth.ts"
→ auth.ts (123 lines, 4,210 raw tokens):
full: 4,210 tok (0% savings)
map: 420 tok (90% savings)
signatures: 546 tok (87% savings)
aggressive: 2,947 tok (30% savings)
entropy: 842 tok (80% savings)
cached: 13 tok (99.7% savings) Cost Calculation
ctx_cost tracks token spend attributed to individual agents and tools,
enabling precise cost accounting in team environments.
| Model | Input Price | 1M Tokens Saved | Daily Savings* |
|---|---|---|---|
| Claude Sonnet 4 | $3/1M | $3.00 | ~$6-15 |
| Claude Opus 4 | $15/1M | $15.00 | ~$30-75 |
| GPT-4o | $2.50/1M | $2.50 | ~$5-12 |
| Gemini 2.5 Pro | $1.25/1M | $1.25 | ~$2.50-6 |
*Based on typical session with 2-4M tokens saved per day, 88% compression rate.
Real-World Example: 5-File Bug Fix
A typical debugging session fixing an auth bug:
| Action | Without lean-ctx | With lean-ctx | Savings |
|---|---|---|---|
| Read auth.ts (full, first time) | 4,210 tok | 4,210 tok | 0% |
| Read server.ts (map mode) | 8,420 tok | 842 tok | 90% |
| Read db.ts (signatures) | 3,200 tok | 416 tok | 87% |
| Re-read auth.ts (cached) | 4,210 tok | 13 tok | 99.7% |
| git status | 600 tok | 80 tok | 87% |
| npm test | 2,000 tok | 200 tok | 90% |
| Re-read auth.ts (diff) | 4,210 tok | 84 tok | 98% |
| git diff | 800 tok | 120 tok | 85% |
| Total | 27,650 tok | 5,965 tok | 78.4% |
Bidirectional Token Optimization
lean-ctx is the only context runtime that optimizes both input and output tokens simultaneously. Most alternatives address only one side of the equation:
| Tool | Input Optimization | Output Optimization | Both? |
|---|---|---|---|
| RTK (Repomix) | ✓ Context packing, file filtering | ✗ No output compression | ✗ |
| Caveman | ✗ No input optimization | ✓ Terse prompting | ✗ |
| lean-ctx | ✓ Caching, read modes, shell compression, archive, dedup | ✓ Terse Agent (3 levels), CRP mode, CEP instruction codes | ✓ |
Input Token Savings
Achieved through the 5-layer system described above:
- Session cache - ~13 token stubs replace re-reads (99% on re-reads)
- Read modes - map, signatures, entropy, aggressive compress first reads (20-95%)
- Shell compression - 95+ tool-specific patterns strip boilerplate (50-95%)
- Archive + Expand - large results stored to disk, retrieved on demand (80-95%)
- Deduplication -
ctx_dedupdetects and eliminates redundant content in context
Output Token Savings
Achieved through the Terse Agent system and protocol modes:
- Terse Agent - 3 compression levels applied to model output (30-75% savings)
- CRP mode - maximum density protocol with abbreviations and symbol mapping
- CEP instruction codes -
ACT1,DELTA,1LINE,DIFFcontrol output verbosity
Combined Effect
By reducing both input and output tokens, lean-ctx achieves cost savings that are multiplicative, not additive. A session that saves 85% on input and 50% on output reduces total token cost by approximately 67% overall - far more than either optimization alone.
Terse Agent
The Terse Agent system instructs the model to produce compressed output at 3 escalating levels. Each level trades readability for token efficiency, and the right level depends on the task.
| Level | Output Savings | Behavior | Best For |
|---|---|---|---|
| lite | 30-40% | Shorter sentences, no redundant explanations, concise code comments | General development, code review |
| full | 50-65% | Abbreviated notation, diff-only output, function refs (F1:42), no narration | Experienced developers, refactoring |
| ultra | 65-75% | Maximum compression: +/-/~ notation, 1 line per action, symbol abbreviations (fn, cfg, impl, deps) | High-volume sessions, CI/CD agents |
CRP Mode Interaction
When CRP_MODE=tdd is set, Terse Agent automatically activates at
ultra level with additional constraints:
- Budget limited to ≤150 tokens per response
- Zero narration - only tool calls and changed code
- Fn refs only (
F1:42) instead of full file paths - Structured notation:
+F1:42 param(timeout:Duration)for additions,-F1:10-15for removals,~F1:42 old→newfor modifications
Per-Project Override
Set the Terse Agent level per project in config.toml or via environment variable:
# config.toml
[terse]
level = "full" # "lite", "full", or "ultra"
# Or via environment variable
export LCTX_TERSE_LEVEL=full
The level can also be changed mid-session via ctx_session without restarting
the lean-ctx server.
Optimization Tips
- Use
ctx_preloadat session start: Loads relevant files in optimal modes instead of multiple individual reads. - Prefer
maporsignaturesfor context files: Only usefullfor files you'll actually edit. - Enable TDD mode for large projects: Set
CRP_MODE=tddfor maximum compression when working with many files. - Use
ctx_compressperiodically: Creates checkpoints that survive context window truncation, avoiding expensive re-reads. - Enable the archive: Large tool results are stored to disk and retrieved on demand, keeping the context window small.
- Monitor with
ctx_gain: Check your savings periodically to identify optimization opportunities. - Use
lines:N-Mfor surgical reads: When you only need a specific function in a large file, read just those lines.
Beyond LeanCTX: General Token Optimization
LeanCTX optimizes tool outputs (file reads, shell commands, search results). But most tokens are consumed elsewhere. Here's a breakdown of a typical AI coding session:
| Category | % of Total | Controllable? |
|---|---|---|
| Thinking tokens | 60-80% | Model choice, thinking budgets |
| Conversation history | 10-15% | Start fresh chats for new tasks |
| System prompt / rules | 3-5% | Keep rules concise |
| Open files (IDE) | 2-5% | Close irrelevant tabs |
| Subagent overhead | 5-15% | Use fewer, targeted subagents |
| Tool outputs (LeanCTX) | 3-5% | Fully optimized by LeanCTX |
Quick wins
- Choose the right model:
composer-2-fast(1x cost, minimal thinking) for simple tasks.claude-4.5-sonnet(3x) for standard work. Reserveclaude-4.6-opus-high-thinking(10-20x) for architecture decisions. - Close irrelevant files: Many IDEs attach open file contents to every request. Close tabs you're not editing.
- Start fresh chats: Long conversations accumulate history. Start a new chat when switching tasks.
- Minimize subagents: Each subagent duplicates the full system prompt and context. Use them only when parallel work is needed.
- Keep rules files lean: Every rule in
.cursorrulesorAGENTS.mdis sent with every request.