<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
    <title>Parker Jones Dev Blog - performance</title>
    <subtitle>Dev Blog of Parker Jones</subtitle>
    <link rel="self" type="application/atom+xml" href="https://parkerjones.dev/tags/performance/atom.xml"/>
    <link rel="alternate" type="text/html" href="https://parkerjones.dev"/>
    <generator uri="https://www.getzola.org/">Zola</generator>
    <updated>2026-06-26T00:00:00+00:00</updated>
    <id>https://parkerjones.dev/tags/performance/atom.xml</id>
    <entry xml:lang="en">
        <title>RTK: Cutting My Claude Code Token Bill by 90% with a Rust Proxy</title>
        <published>2026-06-26T00:00:00+00:00</published>
        <updated>2026-06-26T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://parkerjones.dev/posts/rtk-token-killer/"/>
        <id>https://parkerjones.dev/posts/rtk-token-killer/</id>
        
        <content type="html" xml:base="https://parkerjones.dev/posts/rtk-token-killer/">&lt;p&gt;I&#x27;ve been running coding agents against real work for a while now, and the thing nobody warns you about is the &lt;em&gt;output tax&lt;&#x2F;em&gt;. Every time the agent runs &lt;code&gt;git status&lt;&#x2F;code&gt;, &lt;code&gt;cargo test&lt;&#x2F;code&gt;, or &lt;code&gt;aws lambda list-functions&lt;&#x2F;code&gt;, the entire wall of text it gets back is fed into the model&#x27;s context. You pay for those tokens once when they arrive — and then again on every single turn after, because that output sits in the context window and gets re-sent until the conversation ends.&lt;&#x2F;p&gt;
&lt;p&gt;Most of that text is noise. ANSI color codes. Timestamps. Table columns I don&#x27;t care about. JSON fields I&#x27;ll never read. So I built a small Rust tool to strip it out before it ever reaches the model. I call it &lt;strong&gt;RTK&lt;&#x2F;strong&gt; — Rust Token Killer.&lt;&#x2F;p&gt;
&lt;p&gt;Here&#x27;s what it&#x27;s done across my last few thousand commands:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code data-lang=&quot;text&quot;&gt;RTK Token Savings (Global Scope)
════════════════════════════════════════════════════════════
Total commands:    6283
Input tokens:      26.9M
Output tokens:     2.5M
Tokens saved:      24.5M (90.9%)
Total exec time:   462m34s (avg 4.4s)
Efficiency meter:  ██████████████████████░░ 90.9%
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Six thousand commands, &lt;strong&gt;24.5 million tokens saved — 90.9%&lt;&#x2F;strong&gt;. That&#x27;s not a microbenchmark; that&#x27;s my actual usage.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-hidden-cost-of-agent-tool-output&quot;&gt;The hidden cost of agent tool output&lt;&#x2F;h2&gt;
&lt;p&gt;When you use a chat model directly, you read its output and the cost stops there. When you put a model in an &lt;em&gt;agent loop&lt;&#x2F;em&gt;, the economics flip. The model runs a tool, the tool&#x27;s stdout&#x2F;stderr comes back as a tool result, and that result becomes part of the running transcript. Every subsequent turn re-sends the whole transcript. So a 4,000-token &lt;code&gt;cargo test&lt;&#x2F;code&gt; dump on turn 3 is still being paid for on turn 30.&lt;&#x2F;p&gt;
&lt;p&gt;The fix isn&#x27;t &quot;run fewer commands.&quot; Agents &lt;em&gt;should&lt;&#x2F;em&gt; poke at the system constantly — that&#x27;s what makes them useful. The fix is to make each command&#x27;s output carry only the information the model actually needs to make its next decision.&lt;&#x2F;p&gt;
&lt;p&gt;That&#x27;s a filtering problem, and filtering is exactly the kind of thing Rust is good at: fast, streaming, zero-allocation where it counts, and cheap enough that wrapping every command in the agent&#x27;s hot loop adds no friction. RTK averages 4.4s per command — and most of that is the underlying command itself, not the proxy.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;how-it-works-transparent-hook-based-rewriting&quot;&gt;How it works: transparent, hook-based rewriting&lt;&#x2F;h2&gt;
&lt;p&gt;The design goal was that I should never have to think about RTK. I don&#x27;t want to teach the agent to &quot;use rtk&quot; — I want every command it already runs to get filtered automatically.&lt;&#x2F;p&gt;
&lt;p&gt;So RTK plugs into Claude Code as a hook. When the agent decides to run &lt;code&gt;git status&lt;&#x2F;code&gt;, the hook rewrites it to &lt;code&gt;rtk git status&lt;&#x2F;code&gt; before execution. RTK runs the real command, reshapes the output, and hands back the lean version. The agent never knows the proxy is there, and the rewrite itself costs zero tokens.&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code data-lang=&quot;text&quot;&gt;git status        →   rtk git status      (transparent, 0 tokens overhead)
cargo test        →   rtk cargo test
aws lambda list…  →   rtk aws lambda list-functions
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;There are a handful of meta-commands I do call directly:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code data-lang=&quot;bash&quot;&gt;rtk gain              # token-savings analytics (the table above)
rtk gain --history    # per-command history with savings
rtk discover          # mine Claude Code history for missed opportunities
rtk proxy &amp;lt;cmd&amp;gt;       # run a command raw, no filtering (escape hatch)
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h2 id=&quot;where-the-savings-actually-come-from&quot;&gt;Where the savings actually come from&lt;&#x2F;h2&gt;
&lt;p&gt;Not every command saves the same amount, and that&#x27;s the interesting part. Breaking down my top commands by impact:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code data-lang=&quot;text&quot;&gt;  #  Command                   Count   Saved    Avg%
 1.  rtk cargo test --work…        7   10.8M  100.0%
 2.  rtk aws lambda list-f…        9    3.1M   22.2%
 3.  rtk read                    602    2.7M   18.0%
 4.  rtk git push origin …         1    1.8M  100.0%
 5.  rtk cargo test --work…        1    1.8M  100.0%
 6.  rtk git push -u origi…        1    1.7M  100.0%
 7.  rtk aws cloudformatio…        2  324.0K   50.0%
 8.  rtk:toml ps -ef               6  298.1K   98.7%
 9.  rtk:toml ps aux               5  288.8K   98.5%
10.  rtk gh pr diff               18  110.7K   48.1%
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Three distinct categories show up here:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Full suppression (100%).&lt;&#x2F;strong&gt; &lt;code&gt;cargo test --workspace&lt;&#x2F;code&gt; and &lt;code&gt;git push&lt;&#x2F;code&gt; produce enormous progress streams — compiler chatter, per-test lines, transfer counters — and the only thing the model needs is &lt;em&gt;did it pass&lt;&#x2F;em&gt; or &lt;em&gt;what failed&lt;&#x2F;em&gt;. When everything succeeds, the right answer is a one-line summary, and the raw 10.8M tokens of test output never enter context. That single command class is my biggest win by a mile.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Reshaping (98%).&lt;&#x2F;strong&gt; &lt;code&gt;ps -ef&lt;&#x2F;code&gt; and &lt;code&gt;ps aux&lt;&#x2F;code&gt; are wide, repetitive tables. Run them through RTK&#x27;s structured &lt;code&gt;:toml&lt;&#x2F;code&gt; output mode and you get a compact, parseable representation — 98%+ smaller, and &lt;em&gt;easier&lt;&#x2F;em&gt; for the model to reason about than a column-aligned ASCII table.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Trimming (18–50%).&lt;&#x2F;strong&gt; &lt;code&gt;read&lt;&#x2F;code&gt;, &lt;code&gt;gh pr diff&lt;&#x2F;code&gt;, and &lt;code&gt;aws&lt;&#x2F;code&gt; calls don&#x27;t get gutted — the content matters — but there&#x27;s still 18–50% of pure formatting cruft to shave. Note &lt;code&gt;read&lt;&#x2F;code&gt; ran &lt;strong&gt;602 times&lt;&#x2F;strong&gt;: small per-call savings on a high-frequency command adds up to 2.7M tokens.&lt;&#x2F;p&gt;
&lt;p&gt;The lesson I keep relearning: optimize the things you do constantly (&lt;code&gt;read&lt;&#x2F;code&gt;) &lt;em&gt;and&lt;&#x2F;em&gt; the things that dump enormous one-shot payloads (&lt;code&gt;cargo test&lt;&#x2F;code&gt;). The middle is where most of the volume hides.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;structured-output-with-toml-mode&quot;&gt;Structured output with &lt;code&gt;:toml&lt;&#x2F;code&gt; mode&lt;&#x2F;h2&gt;
&lt;p&gt;The &lt;code&gt;:toml&lt;&#x2F;code&gt; variants above aren&#x27;t a gimmick. ASCII tables are optimized for &lt;em&gt;human eyes&lt;&#x2F;em&gt; — alignment, separators, headers repeated for readability. A model doesn&#x27;t need any of that; it needs keys and values. Emitting &lt;code&gt;ps aux&lt;&#x2F;code&gt; as compact TOML drops the token count by ~98% &lt;strong&gt;and&lt;&#x2F;strong&gt; removes the ambiguity of parsing whitespace-aligned columns. Cheaper and more reliable at the same time is a rare trade to win.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;finding-what-you-re-missing-rtk-discover&quot;&gt;Finding what you&#x27;re missing: &lt;code&gt;rtk discover&lt;&#x2F;code&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;The one I reach for when I want to tune things is &lt;code&gt;rtk discover&lt;&#x2F;code&gt;. It reads back through my Claude Code history and flags commands that burned tokens but aren&#x27;t being proxied yet — the long tail of &quot;oh, I run &lt;em&gt;that&lt;&#x2F;em&gt; a lot.&quot; It turns token optimization into a feedback loop instead of a guessing game: ship a filter, run for a week, ask &lt;code&gt;discover&lt;&#x2F;code&gt; what&#x27;s still expensive, repeat.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;when-not-to-filter&quot;&gt;When &lt;em&gt;not&lt;&#x2F;em&gt; to filter&lt;&#x2F;h2&gt;
&lt;p&gt;Filtering is lossy by definition, and sometimes I genuinely need the raw bytes — debugging a tool whose exact output format is the thing I&#x27;m investigating, or chasing a bug &lt;em&gt;in a filter itself&lt;&#x2F;em&gt;. That&#x27;s what &lt;code&gt;rtk proxy &amp;lt;cmd&amp;gt;&lt;&#x2F;code&gt; is for: run it completely untouched. Having a clean escape hatch is what makes aggressive default filtering safe. If the trim ever hides something I needed, I&#x27;m one command away from the truth.&lt;&#x2F;p&gt;
&lt;p&gt;One footgun worth flagging: there&#x27;s a name collision out there. If &lt;code&gt;rtk gain&lt;&#x2F;code&gt; erroring with &quot;command not found,&quot; you&#x27;ve probably got &lt;code&gt;reachingforthejack&#x2F;rtk&lt;&#x2F;code&gt; (a Rust Type Kit) on your &lt;code&gt;PATH&lt;&#x2F;code&gt; instead. &lt;code&gt;which rtk&lt;&#x2F;code&gt; sorts it out.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-i-learned&quot;&gt;What I learned&lt;&#x2F;h2&gt;
&lt;p&gt;I went in thinking of this as a cost optimization, and it is — but the bigger payoff turned out to be &lt;strong&gt;context window hygiene&lt;&#x2F;strong&gt;. Tokens are cheap-ish; &lt;em&gt;context is scarce&lt;&#x2F;em&gt;. Every line of noise I strip is a line that isn&#x27;t crowding out something the model actually needs to remember three turns from now. The 90% I&#x27;m not paying for is nice. The 90% that isn&#x27;t diluting the agent&#x27;s attention is the real win.&lt;&#x2F;p&gt;
&lt;p&gt;Next up: smarter, per-command filter profiles (a &lt;code&gt;cargo test&lt;&#x2F;code&gt; filter shouldn&#x27;t look anything like an &lt;code&gt;aws&lt;&#x2F;code&gt; filter), and packaging RTK so it&#x27;s a one-line install for anyone else running agents against a noisy CLI. If that&#x27;s you, the output tax is real and you&#x27;re almost certainly paying it. It&#x27;s very killable.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;— Parker Jones, &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;parkerjones.dev&quot;&gt;parkerjones.dev&lt;&#x2F;a&gt;&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
</content>
        
    </entry>
</feed>
