CLI

larkx bench

Real-time token benchmark from Claude. Not estimates.

Every number reported by larkx bench comes straight from Claude. These are the tokens and cost Anthropic actually bills you for.

Requirements

Claude Code CLI installed and authenticated. Get it at claude.ai/download.
Project is indexed, run larkx init and larkx index first.
Some Claude usage budget, a single --only=overview run costs roughly $0.10–0.20.

Usage

bash

larkx bench [prompt...] [options]

Options

Flag	Effect
`--only <ids>`	Comma-separated subset of the auto-generated suite. IDs: `overview`, `file-summary`, `find-symbol`, `call-chain`, `impact`, `dead-code`.
`--ask "..."`	Add a single custom prompt on top of the suite. Same effect as passing a positional prompt.
`--trials <n>`	Average over N runs per side. Default `1`. Use `3`–`5` to smooth out Claude's ±10–20% per-run variance.
`--model <name>`	Claude model to use (e.g. `claude-haiku-4-5-20251001` for cheaper runs).
`--timeout <sec>`	Per-call timeout. Default `120` seconds.

Examples

bash

# Cheapest sanity check, one query, ~1 min
larkx bench --only=overview

# Full auto-generated suite
larkx bench

# Add your own prompt on top of the suite
larkx bench "How does authentication work in this codebase?"

# Subset of the suite plus your own prompt
larkx bench --only=overview "How does the parser work?"

# Use the cheaper Haiku model
larkx bench --only=overview --model=claude-haiku-4-5-20251001

# Average over 3 trials per side to smooth out variance
larkx bench --only=overview --trials=3

Example output

larkx bench --only=overview, real run

larkx real benchmark (claude code, before vs after)
project : D:\SUMIT\my-project
queries : 1

▸ [overview] Give me a high-level overview of this codebase.
  baseline ... 101K tok · $0.1565 · 7 turns · 38.3s
  larkx    ... 52K tok  · $0.0903 · 3 turns · 27.6s

Summary (real tokens reported by Claude Code):
id              baseline   larkx     saved   base $    larkx $   turns(b/l)
---------------------------------------------------------------------------
overview        101K       52K       49%     $0.1565   $0.0903   7/3
---------------------------------------------------------------------------
TOTAL           101K       52K       49%     $0.1565   $0.0903

full report → .larkx/bench/2026-05-19T16-34-48-271Z.json

Slash command (Claude Code)

If you ran larkx init and picked Claude Code, you also got a slash command. Type any of these inside your IDE chat:

bash

/larkx-bench
/larkx-bench --only=overview
/larkx-bench "How does authentication work?"
/larkx-bench --only=overview "How does the parser work?"
/larkx-bench --model=claude-haiku-4-5-20251001

Saved reports

Every run also writes a JSON report to .larkx/bench/<timestamp>.json — handy for blog posts, PR descriptions, or sharing with a teammate.

Methodology caveats

Numbers are real, but a few things to be aware of when interpreting them:

System-prompt asymmetry. The larkx run gets a system prompt telling Claude to prefer larkx MCP tools (the same one larkx init installs in CLAUDE.md). The baseline gets no equivalent "be token-efficient" coaching. This reflects the real production setup of each, but it does mean we're comparing "Claude + larkx (configured)" vs "Claude (vanilla)", not isolated MCP overhead.
Single-trial variance. Two runs of the same prompt can differ by ±10–20%. For a defensible number, use --trials=3 or higher and report the mean.
Server-side caching. Anthropic caches prompt prefixes. Repeated runs of the same query may show lower numbers than the first call. Baseline runs before larkx in our loop, so any warm-cache benefit slightly favours larkx, use --trials to wash this out.
Project-specific.The headline numbers in our marketing material are measured on the larkx repo. Your own savings depend on project size, file shape, and the kind of questions you ask. Tiny projects (under ~30 files) can see flat or negative savings, larkx's MCP responses can cost more than just reading 2–3 small files.

Troubleshooting

Rate-limit hit mid-run, larkx stops immediately instead of burning the rest of your queries. Re-run later or use --only to pick a subset.
Permission prompt in Claude Code for larkx bench, re-run larkx init with Claude Code selected, or add "Bash(larkx bench)" and "Bash(larkx bench:*)" to .claude/settings.json > permissions.allow.