ドキュメント

Token Savings - How lean-ctx Reduces Costs

Deep dive into how lean-ctx achieves 74-99% token savings: file caching, read mode compression, shell patterns, TDD mode, and the archive system.

lean-ctx typically saves 74-99% of tokens across a development session. This page explains where those savings come from, how to measure them, and how to maximize them.

Where Savings Come From

Token savings accumulate across 5 layers, each multiplying the effect:

LayerMechanismTypical Savings
1. Session Cache Re-reads return ~13 token stub instead of full file ~99% on re-reads
2. Read Modes map, signatures, entropy etc. compress first reads 20-95% per first read
3. Shell Compression 95+ tool patterns strip boilerplate output 50-95% per command
4. TDD/CRP Mode Abbreviated notation, symbol maps, diff-only Additional 20-40%
5. Archive + Expand Large results stored to disk, retrieved on demand 80-95% for large outputs

Compound Effect

In a typical session, the agent reads 15-30 files and runs 10-20 shell commands. With caching eliminating re-reads and compression reducing first reads, the total token savings compound to 74-99% depending on the workload.

Measuring Your Savings

lean-ctx provides several tools to monitor token savings in real-time:

ctx_gain - Session Savings Report

ctx_gain
→ Session gain: 88.4% (34,218 tokens saved)
  File reads:   91.2% (28,450 tokens saved, 23 reads, 14 cache hits)
  Shell output: 78.3% (5,768 tokens saved, 12 commands)
  Total cost saved: ~$0.34 (at Claude Sonnet rates)

ctx_metrics - Detailed Breakdown

ctx_metrics
→ Session metrics:
  Duration: 12 min
  Tool calls: 35
  File reads: 23 (14 cache hits, 61% hit rate)
  Shell commands: 12
  Tokens: 4,018 used / 38,236 without lean-ctx
  Savings: 34,218 tokens (89.5%)
  Compression ratio: 9.5:1

ctx_benchmark - Per-File Analysis

ctx_benchmark path="src/auth.ts"
→ auth.ts (123 lines, 4,210 raw tokens):
  full:       4,210 tok (0% savings)
  map:          420 tok (90% savings)
  signatures:   546 tok (87% savings)
  aggressive: 2,947 tok (30% savings)
  entropy:      842 tok (80% savings)
  cached:        13 tok (99.7% savings)

Cost Calculation

ctx_cost tracks token spend attributed to individual agents and tools, enabling precise cost accounting in team environments.

ModelInput Price1M Tokens SavedDaily Savings*
Claude Sonnet 4$3/1M$3.00~$6-15
Claude Opus 4$15/1M$15.00~$30-75
GPT-4o$2.50/1M$2.50~$5-12
Gemini 2.5 Pro$1.25/1M$1.25~$2.50-6

*Based on typical session with 2-4M tokens saved per day, 88% compression rate.

Real-World Example: 5-File Bug Fix

A typical debugging session fixing an auth bug:

ActionWithout lean-ctxWith lean-ctxSavings
Read auth.ts (full, first time)4,210 tok4,210 tok0%
Read server.ts (map mode)8,420 tok842 tok90%
Read db.ts (signatures)3,200 tok416 tok87%
Re-read auth.ts (cached)4,210 tok13 tok99.7%
git status600 tok80 tok87%
npm test2,000 tok200 tok90%
Re-read auth.ts (diff)4,210 tok84 tok98%
git diff800 tok120 tok85%
Total27,650 tok5,965 tok78.4%

Bidirectional Token Optimization

lean-ctx is the only context runtime that optimizes both input and output tokens simultaneously. Most alternatives address only one side of the equation:

ToolInput OptimizationOutput OptimizationBoth?
RTK (Repomix) ✓ Context packing, file filtering ✗ No output compression
Caveman ✗ No input optimization ✓ Terse prompting
lean-ctx ✓ Caching, read modes, shell compression, archive, dedup ✓ Terse Agent (3 levels), CRP mode, CEP instruction codes

Input Token Savings

Achieved through the 5-layer system described above:

  • Session cache - ~13 token stubs replace re-reads (99% on re-reads)
  • Read modes - map, signatures, entropy, aggressive compress first reads (20-95%)
  • Shell compression - 95+ tool-specific patterns strip boilerplate (50-95%)
  • Archive + Expand - large results stored to disk, retrieved on demand (80-95%)
  • Deduplication - ctx_dedup detects and eliminates redundant content in context

Output Token Savings

Achieved through the Terse Agent system and protocol modes:

  • Terse Agent - 3 compression levels applied to model output (30-75% savings)
  • CRP mode - maximum density protocol with abbreviations and symbol mapping
  • CEP instruction codes - ACT1, DELTA, 1LINE, DIFF control output verbosity

Combined Effect

By reducing both input and output tokens, lean-ctx achieves cost savings that are multiplicative, not additive. A session that saves 85% on input and 50% on output reduces total token cost by approximately 67% overall - far more than either optimization alone.

Terse Agent

The Terse Agent system instructs the model to produce compressed output at 3 escalating levels. Each level trades readability for token efficiency, and the right level depends on the task.

LevelOutput SavingsBehaviorBest For
lite 30-40% Shorter sentences, no redundant explanations, concise code comments General development, code review
full 50-65% Abbreviated notation, diff-only output, function refs (F1:42), no narration Experienced developers, refactoring
ultra 65-75% Maximum compression: +/-/~ notation, 1 line per action, symbol abbreviations (fn, cfg, impl, deps) High-volume sessions, CI/CD agents

CRP Mode Interaction

When CRP_MODE=tdd is set, Terse Agent automatically activates at ultra level with additional constraints:

  • Budget limited to ≤150 tokens per response
  • Zero narration - only tool calls and changed code
  • Fn refs only (F1:42) instead of full file paths
  • Structured notation: +F1:42 param(timeout:Duration) for additions, -F1:10-15 for removals, ~F1:42 old→new for modifications

Per-Project Override

Set the Terse Agent level per project in config.toml or via environment variable:

# config.toml
[terse]
level = "full"    # "lite", "full", or "ultra"

# Or via environment variable
export LCTX_TERSE_LEVEL=full

The level can also be changed mid-session via ctx_session without restarting the lean-ctx server.

Optimization Tips

  1. Use ctx_preload at session start: Loads relevant files in optimal modes instead of multiple individual reads.
  2. Prefer map or signatures for context files: Only use full for files you'll actually edit.
  3. Enable TDD mode for large projects: Set CRP_MODE=tdd for maximum compression when working with many files.
  4. Use ctx_compress periodically: Creates checkpoints that survive context window truncation, avoiding expensive re-reads.
  5. Enable the archive: Large tool results are stored to disk and retrieved on demand, keeping the context window small.
  6. Monitor with ctx_gain: Check your savings periodically to identify optimization opportunities.
  7. Use lines:N-M for surgical reads: When you only need a specific function in a large file, read just those lines.

Beyond LeanCTX: General Token Optimization

LeanCTX optimizes tool outputs (file reads, shell commands, search results). But most tokens are consumed elsewhere. Here's a breakdown of a typical AI coding session:

Category% of TotalControllable?
Thinking tokens60-80%Model choice, thinking budgets
Conversation history10-15%Start fresh chats for new tasks
System prompt / rules3-5%Keep rules concise
Open files (IDE)2-5%Close irrelevant tabs
Subagent overhead5-15%Use fewer, targeted subagents
Tool outputs (LeanCTX)3-5%Fully optimized by LeanCTX

Quick wins

  • Choose the right model: composer-2-fast (1x cost, minimal thinking) for simple tasks. claude-4.5-sonnet (3x) for standard work. Reserve claude-4.6-opus-high-thinking (10-20x) for architecture decisions.
  • Close irrelevant files: Many IDEs attach open file contents to every request. Close tabs you're not editing.
  • Start fresh chats: Long conversations accumulate history. Start a new chat when switching tasks.
  • Minimize subagents: Each subagent duplicates the full system prompt and context. Use them only when parallel work is needed.
  • Keep rules files lean: Every rule in .cursorrules or AGENTS.md is sent with every request.