Journeys
Journey: Hermes: lean-ctx as Context Engine
Flip the integration around: instead of being an MCP server your agent calls, lean-ctx becomes Hermes Agent's active context engine. It keeps the system preamble and a fresh tail verbatim, replaces older turns with a recoverable summary, offloads the raw turns into durable memory, and hands the agent first-class recall tools to page them back in losslessly.
You aremaking lean-ctx your agent framework's context engine
ctx_transcript_compactservecontext.enginectx_searchctx_handoffLEANCTX_*
You’ve used lean-ctx as an MCP server an agent calls. This journey flips that
around: lean-ctx becomes the component an agent framework delegates its context
window to. With the hermes-lean-ctx plugin, lean-ctx replaces Hermes Agent’s
built-in ContextCompressor and owns what stays in the window — keeping the
system preamble and a fresh tail verbatim, summarizing older turns recoverably,
offloading the raw turns into durable memory, and giving the agent first-class
recall tools to page that memory back in.
1. Engine vs. MCP server — the shift
As an MCP server, lean-ctx only answers the ctx_* calls a model chooses to
make; it never sees the whole conversation. As a context engine, the host
hands lean-ctx the entire message array on every turn and asks it to compact it.
lean-ctx decides what stays verbatim, what becomes a recoverable summary, and
what is offloaded into session memory.
Only one context engine can be active at a time, so lean-ctx and Hermes’ built-in
compressor (or hermes-lcm) are mutually exclusive — you pick one.
2. The compaction core — ctx_transcript_compact
What it does: Compacts an OpenAI-format message array deterministically. It keeps the system preamble + a fresh tail verbatim, replaces older turns with a recoverable summary, and offloads the raw turns into lean-ctx session memory so they stay retrievable.
ctx_transcript_compact messages=<OpenAI message array>
fresh_tail_tokens=4000 # recent tokens kept verbatim
protect_min_messages=6 # min recent messages kept verbatim
focus_topic="auth refactor" # optional: bias the summary
It’s the 77th MCP tool, exposed on both MCP and the HTTP /v1 tools API, so
the plugin, the CLI and every other client get identical, tested behaviour.
Compaction holds two invariants: a tool_call and its tool_result are never
split across the boundary, and the output is byte-stable for the same input
so it preserves the provider’s prompt-cache prefix.
Returns {messages, stats} — the compacted array plus deterministic stats.
3. Install and activate the engine
# 1. Install the plugin (symlinks into ~/.hermes/plugins).
cd integrations/hermes-lean-ctx && ./scripts/install.sh
# 2. Start the lean-ctx HTTP tools API (serves /v1; default port 8080).
# NOTE: the always-on proxy (4444+) does NOT serve /v1/tools — use `serve`.
lean-ctx serve --host 127.0.0.1 --port 8080
# 3. Install the SDK in Hermes' Python.
pip install lean-ctx-client
# 4. Activate the engine in ~/.hermes/config.yaml:
# context:
# engine: "lean-ctx"
lean-ctx init --agent hermes prints this same engine-plugin hint, so onboarding
points you here automatically. If your server isn’t on the default, set
LEANCTX_BASE_URL (and LEANCTX_TOKEN if you ran serve --auth-token).
4. How a turn flows
Hermes agent loop
└─ ContextEngine ABC ── LeanCtxEngine (thin adapter)
└─ leanctx SDK ── HTTP /v1 ── lean-ctx daemon
└─ ctx_transcript_compact,
ctx_search, ctx_knowledge, …
compress(messages)keeps the system preamble + fresh tail verbatim and replaces older turns with a recoverable summary, calling the daemon’sctx_transcript_compact. If the daemon is unreachable it falls back to a pure Python compaction, so the agent loop never breaks.- Native recall tools inject
ctx_search,ctx_semantic_search,ctx_read,ctx_expand,ctx_knowledgeandctx_summaryinto the agent’s tool list, so the model can pull detail back in on demand after a compaction. - Cross-session persistence runs on the session lifecycle:
resumeon start,ctx_summary+ a deterministicctx_handoffledger on end.
5. Tuning — LEANCTX_*
Compaction fires when the window crosses LEANCTX_THRESHOLD_FRACTION (0.75) of
the model window; LEANCTX_PROTECT_FRACTION (0.25), LEANCTX_PROTECT_MIN_MESSAGES
(6) and LEANCTX_PROTECT_MIN_TOKENS (2000) size the verbatim tail.
LEANCTX_ENABLE_TOOLS toggles the native recall tools and LEANCTX_CORE_COMPACTION
chooses the daemon tool over the local fallback. Window size is inferred from the
model name until Hermes calls update_model(context_length=…), which always wins.
6. Why lean-ctx over the alternatives
| built-in compressor | hermes-lcm | hermes-lean-ctx | |
|---|---|---|---|
| Recall after compaction | lossy | lossless (grep/expand) | lossless (ctx_search/ctx_expand/ctx_knowledge) |
| Cross-session memory | no | per-project | sessions, knowledge, handoff ledgers |
| Determinism / prompt-cache | n/a | partial | deterministic, byte-stable |
| Engine location | in-agent | in-plugin | in the daemon (single source of truth) |
The engine lives in the daemon, so its behaviour is the same one this whole reference documents — and it improves for every client at once when lean-ctx does.
Where the neighbouring topics live
| Topic | Journey |
|---|---|
Build your own agent on the /v1 API + SDKs | Build Your Own Agent |
| Sessions, knowledge & handoffs | Memory & Knowledge |
The proxy, serve & providers | Proxy & Power Integrations |
| Writing your own plugins / WASM | Extend Without Forking |