CLI
larkx bench
Real-time token benchmark from Claude. Not estimates.
Every number reported by
larkx bench comes straight from Claude. These are the tokens and cost Anthropic actually bills you for.Requirements
- Claude Code CLI installed and authenticated. Get it at claude.ai/download.
- Project is indexed, run
larkx initandlarkx indexfirst. - Some Claude usage budget, a single
--only=overviewrun costs roughly $0.10–0.20.
Usage
bash
larkx bench [prompt...] [options]Options
| Flag | Effect |
|---|---|
--only <ids> | Comma-separated subset of the auto-generated suite. IDs: overview, file-summary, find-symbol, call-chain, impact, dead-code. |
--ask "..." | Add a single custom prompt on top of the suite. Same effect as passing a positional prompt. |
--trials <n> | Average over N runs per side. Default 1. Use 3–5 to smooth out Claude's ±10–20% per-run variance. |
--model <name> | Claude model to use (e.g. claude-haiku-4-5-20251001 for cheaper runs). |
--timeout <sec> | Per-call timeout. Default 120 seconds. |
Examples
bash
# Cheapest sanity check, one query, ~1 min
larkx bench --only=overview
# Full auto-generated suite
larkx bench
# Add your own prompt on top of the suite
larkx bench "How does authentication work in this codebase?"
# Subset of the suite plus your own prompt
larkx bench --only=overview "How does the parser work?"
# Use the cheaper Haiku model
larkx bench --only=overview --model=claude-haiku-4-5-20251001
# Average over 3 trials per side to smooth out variance
larkx bench --only=overview --trials=3Example output
larkx bench --only=overview, real run
larkx real benchmark (claude code, before vs after)
project : D:\SUMIT\my-project
queries : 1
▸ [overview] Give me a high-level overview of this codebase.
baseline ... 101K tok · $0.1565 · 7 turns · 38.3s
larkx ... 52K tok · $0.0903 · 3 turns · 27.6s
Summary (real tokens reported by Claude Code):
id baseline larkx saved base $ larkx $ turns(b/l)
---------------------------------------------------------------------------
overview 101K 52K 49% $0.1565 $0.0903 7/3
---------------------------------------------------------------------------
TOTAL 101K 52K 49% $0.1565 $0.0903
full report → .larkx/bench/2026-05-19T16-34-48-271Z.jsonSlash command (Claude Code)
If you ran larkx init and picked Claude Code, you also got a slash command. Type any of these inside your IDE chat:
bash
/larkx-bench
/larkx-bench --only=overview
/larkx-bench "How does authentication work?"
/larkx-bench --only=overview "How does the parser work?"
/larkx-bench --model=claude-haiku-4-5-20251001Saved reports
Every run also writes a JSON report to .larkx/bench/<timestamp>.json — handy for blog posts, PR descriptions, or sharing with a teammate.
Methodology caveats
Numbers are real, but a few things to be aware of when interpreting them:
- System-prompt asymmetry. The larkx run gets a system prompt telling Claude to prefer larkx MCP tools (the same one
larkx initinstalls inCLAUDE.md). The baseline gets no equivalent "be token-efficient" coaching. This reflects the real production setup of each, but it does mean we're comparing "Claude + larkx (configured)" vs "Claude (vanilla)", not isolated MCP overhead. - Single-trial variance. Two runs of the same prompt can differ by ±10–20%. For a defensible number, use
--trials=3or higher and report the mean. - Server-side caching. Anthropic caches prompt prefixes. Repeated runs of the same query may show lower numbers than the first call. Baseline runs before larkx in our loop, so any warm-cache benefit slightly favours larkx, use
--trialsto wash this out. - Project-specific.The headline numbers in our marketing material are measured on the larkx repo. Your own savings depend on project size, file shape, and the kind of questions you ask. Tiny projects (under ~30 files) can see flat or negative savings, larkx's MCP responses can cost more than just reading 2–3 small files.
Troubleshooting
- Rate-limit hit mid-run, larkx stops immediately instead of burning the rest of your queries. Re-run later or use
--onlyto pick a subset. - Permission prompt in Claude Code for
larkx bench, re-runlarkx initwith Claude Code selected, or add"Bash(larkx bench)"and"Bash(larkx bench:*)"to.claude/settings.json>permissions.allow.