Token Savings - How lean-ctx Reduces Costs

lean-ctx typically saves 74-99% of tokens across a development session. This page explains where those savings come from, how to measure them, and how to maximize them.

Where Savings Come From

Token savings accumulate across 5 layers, each multiplying the effect:

Layer	Mechanism	Typical Savings
1. Session Cache	Re-reads return ~13 token stub instead of full file	~99% on re-reads
2. Read Modes	map, signatures, entropy etc. compress first reads	20-95% per first read
3. Shell Compression	95+ tool patterns strip boilerplate output	50-95% per command
4. TDD/CRP Mode	Abbreviated notation, symbol maps, diff-only	Additional 20-40%
5. Archive + Expand	Large results stored to disk, retrieved on demand	80-95% for large outputs

Compound Effect

In a typical session, the agent reads 15-30 files and runs 10-20 shell commands. With caching eliminating re-reads and compression reducing first reads, the total token savings compound to 74-99% depending on the workload.

Measuring Your Savings

lean-ctx provides several tools to monitor token savings in real-time:

ctx_gain - Session Savings Report

ctx_gain
→ Session gain: 88.4% (34,218 tokens saved)
  File reads:   91.2% (28,450 tokens saved, 23 reads, 14 cache hits)
  Shell output: 78.3% (5,768 tokens saved, 12 commands)
  Total cost saved: ~$0.34 (at Claude Sonnet rates)

ctx_metrics - Detailed Breakdown

ctx_metrics
→ Session metrics:
  Duration: 12 min
  Tool calls: 35
  File reads: 23 (14 cache hits, 61% hit rate)
  Shell commands: 12
  Tokens: 4,018 used / 38,236 without lean-ctx
  Savings: 34,218 tokens (89.5%)
  Compression ratio: 9.5:1

ctx_benchmark - Per-File Analysis

ctx_benchmark path="src/auth.ts"
→ auth.ts (123 lines, 4,210 raw tokens):
  full:       4,210 tok (0% savings)
  map:          420 tok (90% savings)
  signatures:   546 tok (87% savings)
  aggressive: 2,947 tok (30% savings)
  entropy:      842 tok (80% savings)
  cached:        13 tok (99.7% savings)

Cost Calculation

ctx_cost tracks token spend attributed to individual agents and tools, enabling precise cost accounting in team environments.

Model	Input Price	1M Tokens Saved	Daily Savings*
Claude Sonnet 4	$3/1M	$3.00	~$6-15
Claude Opus 4	$15/1M	$15.00	~$30-75
GPT-4o	$2.50/1M	$2.50	~$5-12
Gemini 2.5 Pro	$1.25/1M	$1.25	~$2.50-6

*Based on typical session with 2-4M tokens saved per day, 88% compression rate.

Real-World Example: 5-File Bug Fix

A typical debugging session fixing an auth bug:

Action	Without lean-ctx	With lean-ctx	Savings
Read auth.ts (full, first time)	4,210 tok	4,210 tok	0%
Read server.ts (map mode)	8,420 tok	842 tok	90%
Read db.ts (signatures)	3,200 tok	416 tok	87%
Re-read auth.ts (cached)	4,210 tok	13 tok	99.7%
git status	600 tok	80 tok	87%
npm test	2,000 tok	200 tok	90%
Re-read auth.ts (diff)	4,210 tok	84 tok	98%
git diff	800 tok	120 tok	85%
Total	27,650 tok	5,965 tok	78.4%

Bidirectional Token Optimization

lean-ctx is the only context runtime that optimizes both input and output tokens simultaneously. Most alternatives address only one side of the equation:

Tool	Input Optimization	Output Optimization	Both?
RTK (Repomix)	✓ Context packing, file filtering	✗ No output compression	✗
Caveman	✗ No input optimization	✓ Terse prompting	✗
lean-ctx	✓ Caching, read modes, shell compression, archive, dedup	✓ Terse Agent (3 levels), CRP mode, CEP instruction codes	✓

Input Token Savings

Achieved through the 5-layer system described above:

Session cache - ~13 token stubs replace re-reads (99% on re-reads)
Read modes - map, signatures, entropy, aggressive compress first reads (20-95%)
Shell compression - 95+ tool-specific patterns strip boilerplate (50-95%)
Archive + Expand - large results stored to disk, retrieved on demand (80-95%)
Deduplication - ctx_dedup detects and eliminates redundant content in context

Output Token Savings

Achieved through the Terse Agent system and protocol modes:

Terse Agent - 3 compression levels applied to model output (30-75% savings)
CRP mode - maximum density protocol with abbreviations and symbol mapping
CEP instruction codes - ACT1, DELTA, 1LINE, DIFF control output verbosity

Combined Effect

By reducing both input and output tokens, lean-ctx achieves cost savings that are multiplicative, not additive. A session that saves 85% on input and 50% on output reduces total token cost by approximately 67% overall - far more than either optimization alone.

Terse Agent

The Terse Agent system instructs the model to produce compressed output at 3 escalating levels. Each level trades readability for token efficiency, and the right level depends on the task.

Level	Output Savings	Behavior	Best For
lite	30-40%	Shorter sentences, no redundant explanations, concise code comments	General development, code review
full	50-65%	Abbreviated notation, diff-only output, function refs (F1:42), no narration	Experienced developers, refactoring
ultra	65-75%	Maximum compression: +/-/~ notation, 1 line per action, symbol abbreviations (fn, cfg, impl, deps)	High-volume sessions, CI/CD agents

CRP Mode Interaction

When CRP_MODE=tdd is set, Terse Agent automatically activates at ultra level with additional constraints:

Budget limited to ≤150 tokens per response
Zero narration - only tool calls and changed code
Fn refs only (F1:42) instead of full file paths
Structured notation: +F1:42 param(timeout:Duration) for additions, -F1:10-15 for removals, ~F1:42 old→new for modifications

Per-Project Override

Set the Terse Agent level per project in config.toml or via environment variable:

# config.toml
[terse]
level = "full"    # "lite", "full", or "ultra"

# Or via environment variable
export LCTX_TERSE_LEVEL=full

The level can also be changed mid-session via ctx_session without restarting the lean-ctx server.

Optimization Tips

Use ctx_preload at session start: Loads relevant files in optimal modes instead of multiple individual reads.
Prefer map or signatures for context files: Only use full for files you'll actually edit.
Enable TDD mode for large projects: Set CRP_MODE=tdd for maximum compression when working with many files.
Use ctx_compress periodically: Creates checkpoints that survive context window truncation, avoiding expensive re-reads.
Enable the archive: Large tool results are stored to disk and retrieved on demand, keeping the context window small.
Monitor with ctx_gain: Check your savings periodically to identify optimization opportunities.
Use lines:N-M for surgical reads: When you only need a specific function in a large file, read just those lines.

Beyond LeanCTX: General Token Optimization

LeanCTX optimizes tool outputs (file reads, shell commands, search results). But most tokens are consumed elsewhere. Here's a breakdown of a typical AI coding session:

Category	% of Total	Controllable?
Thinking tokens	60-80%	Model choice, thinking budgets
Conversation history	10-15%	Start fresh chats for new tasks
System prompt / rules	3-5%	Keep rules concise
Open files (IDE)	2-5%	Close irrelevant tabs
Subagent overhead	5-15%	Use fewer, targeted subagents
Tool outputs (LeanCTX)	3-5%	Fully optimized by LeanCTX

Quick wins

Choose the right model: composer-2-fast (1x cost, minimal thinking) for simple tasks. claude-4.5-sonnet (3x) for standard work. Reserve claude-4.6-opus-high-thinking (10-20x) for architecture decisions.
Close irrelevant files: Many IDEs attach open file contents to every request. Close tabs you're not editing.
Start fresh chats: Long conversations accumulate history. Start a new chat when switching tasks.
Minimize subagents: Each subagent duplicates the full system prompt and context. Use them only when parallel work is needed.
Keep rules files lean: Every rule in .cursorrules or AGENTS.md is sent with every request.