How I Cut My Claude Code Token Usage by 90% with RTK

2026-06-26

Parker Jones and Claude Opus 4.8 in software

#claude-code , #ai , #agents , #cli , #tokens , #rust and #tooling

6 minute read

I've been running coding agents against real work for a while now, and the thing nobody warns you about is the output tax. Every time the agent runs git status, cargo test, or aws lambda list-functions, the entire wall of text it gets back is fed into the model's context. You pay for those tokens once when they arrive — and then again on every single turn after, because that output sits in the context window and gets re-sent until the conversation ends.

Most of that text is noise. ANSI color codes. Timestamps. Table columns I don't care about. JSON fields I'll never read. So I started routing every command through RTK (Rust Token Killer) — an open-source CLI proxy, written in Rust, built to strip exactly this kind of bloat before it reaches the model. It's a brew install rtk away.

Here's what it's done across my last few thousand commands:

RTK Token Savings (Global Scope)
════════════════════════════════════════════════════════════
Total commands:    6283
Input tokens:      26.9M
Output tokens:     2.5M
Tokens saved:      24.5M (90.9%)
Total exec time:   462m34s (avg 4.4s)
Efficiency meter:  ██████████████████████░░ 90.9%

Six thousand commands of my actual usage, 24.5 million tokens saved — 90.9%. That's not a vendor benchmark; it's my own rtk gain report.

The hidden cost of agent tool output

When you use a chat model directly, you read its output and the cost stops there. When you put a model in an agent loop, the economics flip. The model runs a tool, the tool's stdout/stderr comes back as a tool result, and that result becomes part of the running transcript. Every subsequent turn re-sends the whole transcript. So a 4,000-token cargo test dump on turn 3 is still being paid for on turn 30.

The fix isn't "run fewer commands." Agents should poke at the system constantly — that's what makes them useful. The fix is to make each command's output carry only the information the model actually needs to make its next decision.

That's a filtering problem, and filtering is exactly the kind of thing Rust is good at: fast, streaming, cheap enough that wrapping every command in the agent's hot loop adds no friction. In my usage RTK averages 4.4s per command — and most of that is the underlying command itself, not the proxy.

How it works: transparent, hook-based rewriting

What sold me on it is that I never have to think about it. I don't want to teach the agent to "use rtk" — I want every command it already runs to get filtered automatically.

So I wired RTK into Claude Code as a hook. When the agent decides to run git status, the hook rewrites it to rtk git status before execution. RTK runs the real command, reshapes the output, and hands back the lean version. The agent never knows the proxy is there, and the rewrite itself costs zero tokens.

git status        →   rtk git status      (transparent, 0 tokens overhead)
cargo test        →   rtk cargo test
aws lambda list…  →   rtk aws lambda list-functions

There are a handful of meta-commands I call directly:

rtk gain              # token-savings analytics (the table above)
rtk gain --history    # per-command history with savings
rtk discover          # mine Claude Code history for missed opportunities
rtk proxy <cmd>       # run a command raw, no filtering (escape hatch)

Where the savings actually come from

Not every command saves the same amount, and that's the interesting part. Breaking down my top commands by impact:

  #  Command                   Count   Saved    Avg%
 1.  rtk cargo test --work…        7   10.8M  100.0%
 2.  rtk aws lambda list-f…        9    3.1M   22.2%
 3.  rtk read                    602    2.7M   18.0%
 4.  rtk git push origin …         1    1.8M  100.0%
 5.  rtk cargo test --work…        1    1.8M  100.0%
 6.  rtk git push -u origi…        1    1.7M  100.0%
 7.  rtk aws cloudformatio…        2  324.0K   50.0%
 8.  rtk:toml ps -ef               6  298.1K   98.7%
 9.  rtk:toml ps aux               5  288.8K   98.5%
10.  rtk gh pr diff               18  110.7K   48.1%

Three distinct categories show up here:

Full suppression (100%). cargo test --workspace and git push produce enormous progress streams — compiler chatter, per-test lines, transfer counters — and the only thing the model needs is did it pass or what failed. When everything succeeds, the right answer is a one-line summary, and the raw 10.8M tokens of test output never enter context. That single command class is my biggest win by a mile.

Reshaping (98%). ps -ef and ps aux are wide, repetitive tables. Run them through RTK's structured :toml output mode and you get a compact, parseable representation — 98%+ smaller, and easier for the model to reason about than a column-aligned ASCII table.

Trimming (18–50%). read, gh pr diff, and aws calls don't get gutted — the content matters — but there's still 18–50% of pure formatting cruft to shave. Note read ran 602 times: small per-call savings on a high-frequency command adds up to 2.7M tokens.

The lesson I keep relearning: the things you do constantly (read) and the things that dump enormous one-shot payloads (cargo test) both matter. The middle is where most of the volume hides.

Structured output with `:toml` mode

The :toml variants above aren't a gimmick. ASCII tables are optimized for human eyes — alignment, separators, headers repeated for readability. A model doesn't need any of that; it needs keys and values. Emitting ps aux as compact TOML drops the token count by ~98% and removes the ambiguity of parsing whitespace-aligned columns. Cheaper and more reliable at the same time is a rare trade to win.

Finding what you're missing: `rtk discover`

The one I reach for when I want to tune things is rtk discover. It reads back through my Claude Code history and flags commands that burned tokens but aren't being proxied yet — the long tail of "oh, I run that a lot." It turns token optimization into a feedback loop instead of a guessing game: see what's expensive, make sure it's routed through RTK, check gain a week later, repeat.

When not to filter

Filtering is lossy by definition, and sometimes I genuinely need the raw bytes — debugging a tool whose exact output format is the thing I'm investigating. That's what rtk proxy <cmd> is for: run it completely untouched. Having a clean escape hatch is what makes aggressive default filtering safe. If a trim ever hides something I needed, I'm one command away from the truth.

One footgun worth flagging: there's a name collision out there. If rtk gain errors with "command not found," you've probably got reachingforthejack/rtk (a Rust Type Kit) on your PATH instead of the token killer. which rtk sorts it out — the one you want is the Homebrew rtk formula.

What I learned

I started using RTK thinking of it as a cost optimization, and it is — but the bigger payoff turned out to be context window hygiene. Tokens are cheap-ish; context is scarce. Every line of noise it strips is a line that isn't crowding out something the model actually needs to remember three turns from now. The 90% I'm not paying for is nice. The 90% that isn't diluting the agent's attention is the real win.

If you're running agents against a noisy CLI, the output tax is real and you're almost certainly paying it — and it's very killable. RTK lives at rtk-ai.app; brew install rtk and rtk gain will tell you your own number.

— Parker Jones, parkerjones.dev