Journeys

Journey: Hermes: lean-ctx as Context Engine

Flip the integration around: instead of being an MCP server your agent calls, lean-ctx becomes Hermes Agent's active context engine. It keeps the system preamble and a fresh tail verbatim, replaces older turns with a recoverable summary, offloads the raw turns into durable memory, and hands the agent first-class recall tools to page them back in losslessly.

Journey 24Scale & TeamsCompressRemember

You aremaking lean-ctx your agent framework's context engine

Covers

ctx_transcript_compact
serve
context.engine
ctx_search
ctx_handoff
LEANCTX_*

You’ve used lean-ctx as an MCP server an agent calls. This journey flips that around: lean-ctx becomes the component an agent framework delegates its context window to. With the hermes-lean-ctx plugin, lean-ctx replaces Hermes Agent’s built-in ContextCompressor and owns what stays in the window — keeping the system preamble and a fresh tail verbatim, summarizing older turns recoverably, offloading the raw turns into durable memory, and giving the agent first-class recall tools to page that memory back in.

1. Engine vs. MCP server — the shift

As an MCP server, lean-ctx only answers the ctx_* calls a model chooses to make; it never sees the whole conversation. As a context engine, the host hands lean-ctx the entire message array on every turn and asks it to compact it. lean-ctx decides what stays verbatim, what becomes a recoverable summary, and what is offloaded into session memory.

Only one context engine can be active at a time, so lean-ctx and Hermes’ built-in compressor (or hermes-lcm) are mutually exclusive — you pick one.

2. The compaction core — `ctx_transcript_compact`

What it does: Compacts an OpenAI-format message array deterministically. It keeps the system preamble + a fresh tail verbatim, replaces older turns with a recoverable summary, and offloads the raw turns into lean-ctx session memory so they stay retrievable.

ctx_transcript_compact messages=<OpenAI message array>
                        fresh_tail_tokens=4000      # recent tokens kept verbatim
                        protect_min_messages=6      # min recent messages kept verbatim
                        focus_topic="auth refactor" # optional: bias the summary

It’s the 77th MCP tool, exposed on both MCP and the HTTP /v1 tools API, so the plugin, the CLI and every other client get identical, tested behaviour. Compaction holds two invariants: a tool_call and its tool_result are never split across the boundary, and the output is byte-stable for the same input so it preserves the provider’s prompt-cache prefix.

Returns {messages, stats} — the compacted array plus deterministic stats.

3. Install and activate the engine

# 1. Install the plugin (symlinks into ~/.hermes/plugins).
cd integrations/hermes-lean-ctx && ./scripts/install.sh

# 2. Start the lean-ctx HTTP tools API (serves /v1; default port 8080).
#    NOTE: the always-on proxy (4444+) does NOT serve /v1/tools — use `serve`.
lean-ctx serve --host 127.0.0.1 --port 8080

# 3. Install the SDK in Hermes' Python.
pip install lean-ctx-client

# 4. Activate the engine in ~/.hermes/config.yaml:
#    context:
#      engine: "lean-ctx"

lean-ctx init --agent hermes prints this same engine-plugin hint, so onboarding points you here automatically. If your server isn’t on the default, set LEANCTX_BASE_URL (and LEANCTX_TOKEN if you ran serve --auth-token).

4. How a turn flows

Hermes agent loop
   └─ ContextEngine ABC ── LeanCtxEngine (thin adapter)
                               └─ leanctx SDK ── HTTP /v1 ── lean-ctx daemon
                                                              └─ ctx_transcript_compact,
                                                                 ctx_search, ctx_knowledge, …

compress(messages) keeps the system preamble + fresh tail verbatim and replaces older turns with a recoverable summary, calling the daemon’s ctx_transcript_compact. If the daemon is unreachable it falls back to a pure Python compaction, so the agent loop never breaks.
Native recall tools inject ctx_search, ctx_semantic_search, ctx_read, ctx_expand, ctx_knowledge and ctx_summary into the agent’s tool list, so the model can pull detail back in on demand after a compaction.
Cross-session persistence runs on the session lifecycle: resume on start, ctx_summary + a deterministic ctx_handoff ledger on end.

5. Tuning — `LEANCTX_*`

Compaction fires when the window crosses LEANCTX_THRESHOLD_FRACTION (0.75) of the model window; LEANCTX_PROTECT_FRACTION (0.25), LEANCTX_PROTECT_MIN_MESSAGES (6) and LEANCTX_PROTECT_MIN_TOKENS (2000) size the verbatim tail. LEANCTX_ENABLE_TOOLS toggles the native recall tools and LEANCTX_CORE_COMPACTION chooses the daemon tool over the local fallback. Window size is inferred from the model name until Hermes calls update_model(context_length=…), which always wins.

6. Why lean-ctx over the alternatives

	built-in compressor	`hermes-lcm`	hermes-lean-ctx
Recall after compaction	lossy	lossless (grep/expand)	lossless (`ctx_search`/`ctx_expand`/`ctx_knowledge`)
Cross-session memory	no	per-project	sessions, knowledge, handoff ledgers
Determinism / prompt-cache	n/a	partial	deterministic, byte-stable
Engine location	in-agent	in-plugin	in the daemon (single source of truth)

The engine lives in the daemon, so its behaviour is the same one this whole reference documents — and it improves for every client at once when lean-ctx does.

Where the neighbouring topics live

Topic	Journey
Build your own agent on the `/v1` API + SDKs	Build Your Own Agent
Sessions, knowledge & handoffs	Memory & Knowledge
The proxy, `serve` & providers	Proxy & Power Integrations
Writing your own plugins / WASM	Extend Without Forking