<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
    <title>Parker Jones Dev Blog - tooling</title>
    <subtitle>Dev Blog of Parker Jones</subtitle>
    <link rel="self" type="application/atom+xml" href="https://parkerjones.dev/tags/tooling/atom.xml"/>
    <link rel="alternate" type="text/html" href="https://parkerjones.dev"/>
    <generator uri="https://www.getzola.org/">Zola</generator>
    <updated>2026-06-26T00:00:00+00:00</updated>
    <id>https://parkerjones.dev/tags/tooling/atom.xml</id>
    <entry xml:lang="en">
        <title>How I Cut My Claude Code Token Usage by 90% with RTK</title>
        <published>2026-06-26T00:00:00+00:00</published>
        <updated>2026-06-26T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://parkerjones.dev/posts/rtk-token-killer/"/>
        <id>https://parkerjones.dev/posts/rtk-token-killer/</id>
        
        <content type="html" xml:base="https://parkerjones.dev/posts/rtk-token-killer/">&lt;p&gt;I&#x27;ve been running coding agents against real work for a while now, and the thing nobody warns you about is the &lt;em&gt;output tax&lt;&#x2F;em&gt;. Every time the agent runs &lt;code&gt;git status&lt;&#x2F;code&gt;, &lt;code&gt;cargo test&lt;&#x2F;code&gt;, or &lt;code&gt;aws lambda list-functions&lt;&#x2F;code&gt;, the entire wall of text it gets back is fed into the model&#x27;s context. You pay for those tokens once when they arrive — and then again on every single turn after, because that output sits in the context window and gets re-sent until the conversation ends.&lt;&#x2F;p&gt;
&lt;p&gt;Most of that text is noise. ANSI color codes. Timestamps. Table columns I don&#x27;t care about. JSON fields I&#x27;ll never read. So I started routing every command through &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.rtk-ai.app&#x2F;&quot;&gt;RTK&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; (Rust Token Killer) — an open-source CLI proxy, written in Rust, built to strip exactly this kind of bloat before it reaches the model. It&#x27;s a &lt;code&gt;brew install rtk&lt;&#x2F;code&gt; away.&lt;&#x2F;p&gt;
&lt;p&gt;Here&#x27;s what it&#x27;s done across my last few thousand commands:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code data-lang=&quot;text&quot;&gt;RTK Token Savings (Global Scope)
════════════════════════════════════════════════════════════
Total commands:    6283
Input tokens:      26.9M
Output tokens:     2.5M
Tokens saved:      24.5M (90.9%)
Total exec time:   462m34s (avg 4.4s)
Efficiency meter:  ██████████████████████░░ 90.9%
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Six thousand commands of &lt;em&gt;my&lt;&#x2F;em&gt; actual usage, &lt;strong&gt;24.5 million tokens saved — 90.9%&lt;&#x2F;strong&gt;. That&#x27;s not a vendor benchmark; it&#x27;s my own &lt;code&gt;rtk gain&lt;&#x2F;code&gt; report.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-hidden-cost-of-agent-tool-output&quot;&gt;The hidden cost of agent tool output&lt;&#x2F;h2&gt;
&lt;p&gt;When you use a chat model directly, you read its output and the cost stops there. When you put a model in an &lt;em&gt;agent loop&lt;&#x2F;em&gt;, the economics flip. The model runs a tool, the tool&#x27;s stdout&#x2F;stderr comes back as a tool result, and that result becomes part of the running transcript. Every subsequent turn re-sends the whole transcript. So a 4,000-token &lt;code&gt;cargo test&lt;&#x2F;code&gt; dump on turn 3 is still being paid for on turn 30.&lt;&#x2F;p&gt;
&lt;p&gt;The fix isn&#x27;t &quot;run fewer commands.&quot; Agents &lt;em&gt;should&lt;&#x2F;em&gt; poke at the system constantly — that&#x27;s what makes them useful. The fix is to make each command&#x27;s output carry only the information the model actually needs to make its next decision.&lt;&#x2F;p&gt;
&lt;p&gt;That&#x27;s a filtering problem, and filtering is exactly the kind of thing Rust is good at: fast, streaming, cheap enough that wrapping every command in the agent&#x27;s hot loop adds no friction. In my usage RTK averages 4.4s per command — and most of that is the underlying command itself, not the proxy.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;how-it-works-transparent-hook-based-rewriting&quot;&gt;How it works: transparent, hook-based rewriting&lt;&#x2F;h2&gt;
&lt;p&gt;What sold me on it is that I never have to think about it. I don&#x27;t want to teach the agent to &quot;use rtk&quot; — I want every command it already runs to get filtered automatically.&lt;&#x2F;p&gt;
&lt;p&gt;So I wired RTK into Claude Code as a hook. When the agent decides to run &lt;code&gt;git status&lt;&#x2F;code&gt;, the hook rewrites it to &lt;code&gt;rtk git status&lt;&#x2F;code&gt; before execution. RTK runs the real command, reshapes the output, and hands back the lean version. The agent never knows the proxy is there, and the rewrite itself costs zero tokens.&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code data-lang=&quot;text&quot;&gt;git status        →   rtk git status      (transparent, 0 tokens overhead)
cargo test        →   rtk cargo test
aws lambda list…  →   rtk aws lambda list-functions
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;There are a handful of meta-commands I call directly:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code data-lang=&quot;bash&quot;&gt;rtk gain              # token-savings analytics (the table above)
rtk gain --history    # per-command history with savings
rtk discover          # mine Claude Code history for missed opportunities
rtk proxy &amp;lt;cmd&amp;gt;       # run a command raw, no filtering (escape hatch)
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h2 id=&quot;where-the-savings-actually-come-from&quot;&gt;Where the savings actually come from&lt;&#x2F;h2&gt;
&lt;p&gt;Not every command saves the same amount, and that&#x27;s the interesting part. Breaking down my top commands by impact:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code data-lang=&quot;text&quot;&gt;  #  Command                   Count   Saved    Avg%
 1.  rtk cargo test --work…        7   10.8M  100.0%
 2.  rtk aws lambda list-f…        9    3.1M   22.2%
 3.  rtk read                    602    2.7M   18.0%
 4.  rtk git push origin …         1    1.8M  100.0%
 5.  rtk cargo test --work…        1    1.8M  100.0%
 6.  rtk git push -u origi…        1    1.7M  100.0%
 7.  rtk aws cloudformatio…        2  324.0K   50.0%
 8.  rtk:toml ps -ef               6  298.1K   98.7%
 9.  rtk:toml ps aux               5  288.8K   98.5%
10.  rtk gh pr diff               18  110.7K   48.1%
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Three distinct categories show up here:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Full suppression (100%).&lt;&#x2F;strong&gt; &lt;code&gt;cargo test --workspace&lt;&#x2F;code&gt; and &lt;code&gt;git push&lt;&#x2F;code&gt; produce enormous progress streams — compiler chatter, per-test lines, transfer counters — and the only thing the model needs is &lt;em&gt;did it pass&lt;&#x2F;em&gt; or &lt;em&gt;what failed&lt;&#x2F;em&gt;. When everything succeeds, the right answer is a one-line summary, and the raw 10.8M tokens of test output never enter context. That single command class is my biggest win by a mile.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Reshaping (98%).&lt;&#x2F;strong&gt; &lt;code&gt;ps -ef&lt;&#x2F;code&gt; and &lt;code&gt;ps aux&lt;&#x2F;code&gt; are wide, repetitive tables. Run them through RTK&#x27;s structured &lt;code&gt;:toml&lt;&#x2F;code&gt; output mode and you get a compact, parseable representation — 98%+ smaller, and &lt;em&gt;easier&lt;&#x2F;em&gt; for the model to reason about than a column-aligned ASCII table.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Trimming (18–50%).&lt;&#x2F;strong&gt; &lt;code&gt;read&lt;&#x2F;code&gt;, &lt;code&gt;gh pr diff&lt;&#x2F;code&gt;, and &lt;code&gt;aws&lt;&#x2F;code&gt; calls don&#x27;t get gutted — the content matters — but there&#x27;s still 18–50% of pure formatting cruft to shave. Note &lt;code&gt;read&lt;&#x2F;code&gt; ran &lt;strong&gt;602 times&lt;&#x2F;strong&gt;: small per-call savings on a high-frequency command adds up to 2.7M tokens.&lt;&#x2F;p&gt;
&lt;p&gt;The lesson I keep relearning: the things you do constantly (&lt;code&gt;read&lt;&#x2F;code&gt;) &lt;em&gt;and&lt;&#x2F;em&gt; the things that dump enormous one-shot payloads (&lt;code&gt;cargo test&lt;&#x2F;code&gt;) both matter. The middle is where most of the volume hides.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;structured-output-with-toml-mode&quot;&gt;Structured output with &lt;code&gt;:toml&lt;&#x2F;code&gt; mode&lt;&#x2F;h2&gt;
&lt;p&gt;The &lt;code&gt;:toml&lt;&#x2F;code&gt; variants above aren&#x27;t a gimmick. ASCII tables are optimized for &lt;em&gt;human eyes&lt;&#x2F;em&gt; — alignment, separators, headers repeated for readability. A model doesn&#x27;t need any of that; it needs keys and values. Emitting &lt;code&gt;ps aux&lt;&#x2F;code&gt; as compact TOML drops the token count by ~98% &lt;strong&gt;and&lt;&#x2F;strong&gt; removes the ambiguity of parsing whitespace-aligned columns. Cheaper and more reliable at the same time is a rare trade to win.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;finding-what-you-re-missing-rtk-discover&quot;&gt;Finding what you&#x27;re missing: &lt;code&gt;rtk discover&lt;&#x2F;code&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;The one I reach for when I want to tune things is &lt;code&gt;rtk discover&lt;&#x2F;code&gt;. It reads back through my Claude Code history and flags commands that burned tokens but aren&#x27;t being proxied yet — the long tail of &quot;oh, I run &lt;em&gt;that&lt;&#x2F;em&gt; a lot.&quot; It turns token optimization into a feedback loop instead of a guessing game: see what&#x27;s expensive, make sure it&#x27;s routed through RTK, check &lt;code&gt;gain&lt;&#x2F;code&gt; a week later, repeat.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;when-not-to-filter&quot;&gt;When &lt;em&gt;not&lt;&#x2F;em&gt; to filter&lt;&#x2F;h2&gt;
&lt;p&gt;Filtering is lossy by definition, and sometimes I genuinely need the raw bytes — debugging a tool whose exact output format is the thing I&#x27;m investigating. That&#x27;s what &lt;code&gt;rtk proxy &amp;lt;cmd&amp;gt;&lt;&#x2F;code&gt; is for: run it completely untouched. Having a clean escape hatch is what makes aggressive default filtering safe. If a trim ever hides something I needed, I&#x27;m one command away from the truth.&lt;&#x2F;p&gt;
&lt;p&gt;One footgun worth flagging: there&#x27;s a name collision out there. If &lt;code&gt;rtk gain&lt;&#x2F;code&gt; errors with &quot;command not found,&quot; you&#x27;ve probably got &lt;code&gt;reachingforthejack&#x2F;rtk&lt;&#x2F;code&gt; (a Rust Type Kit) on your &lt;code&gt;PATH&lt;&#x2F;code&gt; instead of the token killer. &lt;code&gt;which rtk&lt;&#x2F;code&gt; sorts it out — the one you want is the &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;formulae.brew.sh&#x2F;formula&#x2F;rtk&quot;&gt;Homebrew &lt;code&gt;rtk&lt;&#x2F;code&gt; formula&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-i-learned&quot;&gt;What I learned&lt;&#x2F;h2&gt;
&lt;p&gt;I started using RTK thinking of it as a cost optimization, and it is — but the bigger payoff turned out to be &lt;strong&gt;context window hygiene&lt;&#x2F;strong&gt;. Tokens are cheap-ish; &lt;em&gt;context is scarce&lt;&#x2F;em&gt;. Every line of noise it strips is a line that isn&#x27;t crowding out something the model actually needs to remember three turns from now. The 90% I&#x27;m not paying for is nice. The 90% that isn&#x27;t diluting the agent&#x27;s attention is the real win.&lt;&#x2F;p&gt;
&lt;p&gt;If you&#x27;re running agents against a noisy CLI, the output tax is real and you&#x27;re almost certainly paying it — and it&#x27;s very killable. RTK lives at &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.rtk-ai.app&#x2F;&quot;&gt;rtk-ai.app&lt;&#x2F;a&gt;; &lt;code&gt;brew install rtk&lt;&#x2F;code&gt; and &lt;code&gt;rtk gain&lt;&#x2F;code&gt; will tell you your own number.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;— Parker Jones, &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;parkerjones.dev&quot;&gt;parkerjones.dev&lt;&#x2F;a&gt;&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
</content>
        
    </entry>
</feed>
