Documentation

Tree-sitter Engine

AST-powered signature extraction using tree-sitter for 18 programming languages.

Since v1.5.0, LeanCTX uses tree-sitter for signature extraction instead of line-by-line regex matching. Tree-sitter parses source code into a full Abstract Syntax Tree (AST), enabling accurate extraction of multi-line signatures, arrow functions, decorators, and nested definitions that regex cannot handle.

Why tree-sitter?

CapabilityRegex (old)tree-sitter (new)
Multi-line signaturesMissedFully parsed
Arrow functionsMissedFully parsed
Nested classes / methodsIndentation heuristicAST scope tracking
Decorators / attributesIgnoredAssociated with definitions
Languages supported418
Accuracy~85%~99%

Supported Languages

Each language has dedicated tree-sitter queries that extract the following definition types:

LanguageExtensionsExtracted Definitions
TypeScript.ts, .tsxfunction, class, abstract class, interface, type alias, method, arrow function
JavaScript.js, .jsxfunction, class, method, arrow function
Rust.rsfn, struct, enum, trait, impl, type, const
Python.pydef, class, async def
Go.gofunc, method, type (struct/interface)
Java.javaclass, interface, enum, method, constructor
C.c, .hfunction, struct, enum, typedef
C++.cpp, .cc, .hppfunction, class, struct, enum, namespace
Ruby.rbclass, module, method, singleton_method
C#.csclass, struct, interface, method, property, record
Kotlin.kt, .ktsclass, object, fun, property, interface
Swift.swiftclass, struct, enum, func, protocol, extension
PHP.phpclass, function, interface, trait, namespace
Bash / Shell.sh, .bashfunction
Dart.dartclass, mixin, extension, function, method, enum
Scala.scala, .scclass, object, trait, def, val
Elixir.ex, .exsdefmodule, def, defp, defmacro
Zig.zigfn, struct, enum, union

Svelte (.svelte), Vue (.vue), and other unsupported extensions automatically fall back to the regex-based extractor for TS/JS-like syntax.

How It Works

When you use ctx_read with --mode signatures or --mode map:

  1. LeanCTX detects the file extension and loads the corresponding tree-sitter grammar.
  2. The source code is parsed into an AST in a single pass.
  3. Pre-compiled SCM queries match definition nodes (functions, classes, structs, etc.).
  4. Each match is converted into a compact Signature object with name, parameters, return type, visibility, and async status.
  5. The signatures are formatted using compact notation (or TDD notation if enabled).

Example: Multi-line Rust Signature

The regex extractor would miss this multi-line function. tree-sitter handles it correctly:

// Source (Rust)
pub fn complex_function<T: Display + Debug>(
    first_arg: &str,
    second_arg: Vec<T>,
    third_arg: Option<HashMap<String, Vec<u8>>>,
) -> Result<(), Box<dyn Error>> {
    Ok(())
}

// signatures mode output:
fn ⊛ complex_function(first_arg:&str, second_arg:Vec<T>, third_arg:Option<HashMap<String, Vec<u8>>>) → Result<(), Box<dyn Error>>

Example: Arrow Functions (TypeScript)

// Source (TypeScript)
export const fetchData = async (url: string): Promise<Response> => {
    return fetch(url);
};

// signatures mode output:
fn ⊛ fetchData(url:s) → Promise<Response>

Binary Size

Tree-sitter grammars include compiled C parsers, which increases the binary size from ~5.7 MB to ~17 MB. If size is critical, you can build without tree-sitter:

# Build from source without tree-sitter (regex-only, 4 languages)
cargo install lean-ctx --no-default-features

The default Homebrew formula and cargo install lean-ctx include tree-sitter with all 18 languages.

See also