I've been running coding agents against real work for a while now, and the thing nobody warns you about is the output tax. Every time the agent runs git status, cargo test, or aws lambda list-functions, the entire wall of text it gets back is fed into the model's context. You pay for those tokens once when they arrive — and then again on every single turn after, because that output sits in the context window and gets re-sent until the conversation ends.
Most of that text is noise. ANSI color codes. Timestamps. Table columns I don't care about. JSON fields I'll never read. So I built a small Rust tool to strip it out before it ever reaches the model. I call it RTK — Rust Token Killer.
Here's what it's done across my last few thousand commands:
RTK Token Savings (Global Scope)
════════════════════════════════════════════════════════════
Total commands: 6283
Input tokens: 26.9M
Output tokens: 2.5M
Tokens saved: 24.5M (90.9%)
Total exec time: 462m34s (avg 4.4s)
Efficiency meter: ██████████████████████░░ 90.9%
Six thousand commands, 24.5 million tokens saved — 90.9%. That's not a microbenchmark; that's my actual usage.
The hidden cost of agent tool output
When you use a chat model directly, you read its output and the cost stops there. When you put a model in an agent loop, the economics flip. The model runs a tool, the tool's stdout/stderr comes back as a tool result, and that result becomes part of the running transcript. Every subsequent turn re-sends the whole transcript. So a 4,000-token cargo test dump on turn 3 is still being paid for on turn 30.
The fix isn't "run fewer commands." Agents should poke at the system constantly — that's what makes them useful. The fix is to make each command's output carry only the information the model actually needs to make its next decision.
That's a filtering problem, and filtering is exactly the kind of thing Rust is good at: fast, streaming, zero-allocation where it counts, and cheap enough that wrapping every command in the agent's hot loop adds no friction. RTK averages 4.4s per command — and most of that is the underlying command itself, not the proxy.
How it works: transparent, hook-based rewriting
The design goal was that I should never have to think about RTK. I don't want to teach the agent to "use rtk" — I want every command it already runs to get filtered automatically.
So RTK plugs into Claude Code as a hook. When the agent decides to run git status, the hook rewrites it to rtk git status before execution. RTK runs the real command, reshapes the output, and hands back the lean version. The agent never knows the proxy is there, and the rewrite itself costs zero tokens.
git status → rtk git status (transparent, 0 tokens overhead)
cargo test → rtk cargo test
aws lambda list… → rtk aws lambda list-functions
There are a handful of meta-commands I do call directly:
rtk gain # token-savings analytics (the table above)
rtk gain --history # per-command history with savings
rtk discover # mine Claude Code history for missed opportunities
rtk proxy <cmd> # run a command raw, no filtering (escape hatch)
Where the savings actually come from
Not every command saves the same amount, and that's the interesting part. Breaking down my top commands by impact:
# Command Count Saved Avg%
1. rtk cargo test --work… 7 10.8M 100.0%
2. rtk aws lambda list-f… 9 3.1M 22.2%
3. rtk read 602 2.7M 18.0%
4. rtk git push origin … 1 1.8M 100.0%
5. rtk cargo test --work… 1 1.8M 100.0%
6. rtk git push -u origi… 1 1.7M 100.0%
7. rtk aws cloudformatio… 2 324.0K 50.0%
8. rtk:toml ps -ef 6 298.1K 98.7%
9. rtk:toml ps aux 5 288.8K 98.5%
10. rtk gh pr diff 18 110.7K 48.1%
Three distinct categories show up here:
Full suppression (100%). cargo test --workspace and git push produce enormous progress streams — compiler chatter, per-test lines, transfer counters — and the only thing the model needs is did it pass or what failed. When everything succeeds, the right answer is a one-line summary, and the raw 10.8M tokens of test output never enter context. That single command class is my biggest win by a mile.
Reshaping (98%). ps -ef and ps aux are wide, repetitive tables. Run them through RTK's structured :toml output mode and you get a compact, parseable representation — 98%+ smaller, and easier for the model to reason about than a column-aligned ASCII table.
Trimming (18–50%). read, gh pr diff, and aws calls don't get gutted — the content matters — but there's still 18–50% of pure formatting cruft to shave. Note read ran 602 times: small per-call savings on a high-frequency command adds up to 2.7M tokens.
The lesson I keep relearning: optimize the things you do constantly (read) and the things that dump enormous one-shot payloads (cargo test). The middle is where most of the volume hides.
Structured output with :toml mode
The :toml variants above aren't a gimmick. ASCII tables are optimized for human eyes — alignment, separators, headers repeated for readability. A model doesn't need any of that; it needs keys and values. Emitting ps aux as compact TOML drops the token count by ~98% and removes the ambiguity of parsing whitespace-aligned columns. Cheaper and more reliable at the same time is a rare trade to win.
Finding what you're missing: rtk discover
The one I reach for when I want to tune things is rtk discover. It reads back through my Claude Code history and flags commands that burned tokens but aren't being proxied yet — the long tail of "oh, I run that a lot." It turns token optimization into a feedback loop instead of a guessing game: ship a filter, run for a week, ask discover what's still expensive, repeat.
When not to filter
Filtering is lossy by definition, and sometimes I genuinely need the raw bytes — debugging a tool whose exact output format is the thing I'm investigating, or chasing a bug in a filter itself. That's what rtk proxy <cmd> is for: run it completely untouched. Having a clean escape hatch is what makes aggressive default filtering safe. If the trim ever hides something I needed, I'm one command away from the truth.
One footgun worth flagging: there's a name collision out there. If rtk gain erroring with "command not found," you've probably got reachingforthejack/rtk (a Rust Type Kit) on your PATH instead. which rtk sorts it out.
What I learned
I went in thinking of this as a cost optimization, and it is — but the bigger payoff turned out to be context window hygiene. Tokens are cheap-ish; context is scarce. Every line of noise I strip is a line that isn't crowding out something the model actually needs to remember three turns from now. The 90% I'm not paying for is nice. The 90% that isn't diluting the agent's attention is the real win.
Next up: smarter, per-command filter profiles (a cargo test filter shouldn't look anything like an aws filter), and packaging RTK so it's a one-line install for anyone else running agents against a noisy CLI. If that's you, the output tax is real and you're almost certainly paying it. It's very killable.
— Parker Jones, parkerjones.dev