Since v1.5.0, LeanCTX uses tree-sitter for signature extraction instead of line-by-line regex matching. Tree-sitter parses source code into a full Abstract Syntax Tree (AST), enabling accurate extraction of multi-line signatures, arrow functions, decorators, and nested definitions that regex cannot handle.
Why tree-sitter?
| Capability | Regex (old) | tree-sitter (new) |
|---|---|---|
| Multi-line signatures | Missed | Fully parsed |
| Arrow functions | Missed | Fully parsed |
| Nested classes / methods | Indentation heuristic | AST scope tracking |
| Decorators / attributes | Ignored | Associated with definitions |
| Languages supported | 4 | 18 |
| Accuracy | ~85% | ~99% |
Supported Languages
Each language has dedicated tree-sitter queries that extract the following definition types:
| Language | Extensions | Extracted Definitions |
|---|---|---|
| TypeScript | .ts, .tsx | function, class, abstract class, interface, type alias, method, arrow function |
| JavaScript | .js, .jsx | function, class, method, arrow function |
| Rust | .rs | fn, struct, enum, trait, impl, type, const |
| Python | .py | def, class, async def |
| Go | .go | func, method, type (struct/interface) |
| Java | .java | class, interface, enum, method, constructor |
| C | .c, .h | function, struct, enum, typedef |
| C++ | .cpp, .cc, .hpp | function, class, struct, enum, namespace |
| Ruby | .rb | class, module, method, singleton_method |
| C# | .cs | class, struct, interface, method, property, record |
| Kotlin | .kt, .kts | class, object, fun, property, interface |
| Swift | .swift | class, struct, enum, func, protocol, extension |
| PHP | .php | class, function, interface, trait, namespace |
| Bash / Shell | .sh, .bash | function |
| Dart | .dart | class, mixin, extension, function, method, enum |
| Scala | .scala, .sc | class, object, trait, def, val |
| Elixir | .ex, .exs | defmodule, def, defp, defmacro |
| Zig | .zig | fn, struct, enum, union |
Svelte (.svelte), Vue (.vue), and other unsupported extensions automatically fall back to the regex-based extractor for TS/JS-like syntax.
How It Works
When you use ctx_read with --mode signatures or --mode map:
- LeanCTX detects the file extension and loads the corresponding tree-sitter grammar.
- The source code is parsed into an AST in a single pass.
- Pre-compiled SCM queries match definition nodes (functions, classes, structs, etc.).
- Each match is converted into a compact
Signatureobject with name, parameters, return type, visibility, and async status. - The signatures are formatted using compact notation (or TDD notation if enabled).
Example: Multi-line Rust Signature
The regex extractor would miss this multi-line function. tree-sitter handles it correctly:
// Source (Rust)
pub fn complex_function<T: Display + Debug>(
first_arg: &str,
second_arg: Vec<T>,
third_arg: Option<HashMap<String, Vec<u8>>>,
) -> Result<(), Box<dyn Error>> {
Ok(())
}
// signatures mode output:
fn ⊛ complex_function(first_arg:&str, second_arg:Vec<T>, third_arg:Option<HashMap<String, Vec<u8>>>) → Result<(), Box<dyn Error>> Example: Arrow Functions (TypeScript)
// Source (TypeScript)
export const fetchData = async (url: string): Promise<Response> => {
return fetch(url);
};
// signatures mode output:
fn ⊛ fetchData(url:s) → Promise<Response> Binary Size
Tree-sitter grammars include compiled C parsers, which increases the binary size from ~5.7 MB to ~17 MB. If size is critical, you can build without tree-sitter:
# Build from source without tree-sitter (regex-only, 4 languages)
cargo install lean-ctx --no-default-features The default Homebrew formula and cargo install lean-ctx include tree-sitter with all 18 languages.