RTK: Cutting My Claude Code Token Bill by 90% with a Rust Proxy

2026-06-26T00:00:00+00:00

I've been running coding agents against real work for a while now, and the thing nobody warns you about is the output tax. Every time the agent runs git status</code>, cargo test</code>, or aws lambda list-functions</code>, the entire wall of text it gets back is fed into the model's context. You pay for those tokens once when they arrive — and then again on every single turn after, because that output sits in the context window and gets re-sent until the conversation ends.

Most of that text is noise. ANSI color codes. Timestamps. Table columns I don't care about. JSON fields I'll never read. So I built a small Rust tool to strip it out before it ever reaches the model. I call it RTK — Rust Token Killer.

Here's what it's done across my last few thousand commands:

RTK Token Savings (Global Scope)
════════════════════════════════════════════════════════════
Total commands:    6283
Input tokens:      26.9M
Output tokens:     2.5M
Tokens saved:      24.5M (90.9%)
Total exec time:   462m34s (avg 4.4s)
Efficiency meter:  ██████████████████████░░ 90.9%
</code></pre>
Six thousand commands, 24.5 million tokens saved — 90.9%</strong>. That's not a microbenchmark; that's my actual usage.</p>
The hidden cost of agent tool output</h2>
When you use a chat model directly, you read its output and the cost stops there. When you put a model in an agent loop</em>, the economics flip. The model runs a tool, the tool's stdout/stderr comes back as a tool result, and that result becomes part of the running transcript. Every subsequent turn re-sends the whole transcript. So a 4,000-token cargo test</code> dump on turn 3 is still being paid for on turn 30.</p>
The fix isn't "run fewer commands." Agents should</em> poke at the system constantly — that's what makes them useful. The fix is to make each command's output carry only the information the model actually needs to make its next decision.</p>
That's a filtering problem, and filtering is exactly the kind of thing Rust is good at: fast, streaming, zero-allocation where it counts, and cheap enough that wrapping every command in the agent's hot loop adds no friction. RTK averages 4.4s per command — and most of that is the underlying command itself, not the proxy.</p>
How it works: transparent, hook-based rewriting</h2>
The design goal was that I should never have to think about RTK. I don't want to teach the agent to "use rtk" — I want every command it already runs to get filtered automatically.</p>
So RTK plugs into Claude Code as a hook. When the agent decides to run git status</code>, the hook rewrites it to rtk git status</code> before execution. RTK runs the real command, reshapes the output, and hands back the lean version. The agent never knows the proxy is there, and the rewrite itself costs zero tokens.</p>
git status        →   rtk git status      (transparent, 0 tokens overhead)
cargo test        →   rtk cargo test
aws lambda list…  →   rtk aws lambda list-functions
</code></pre>
There are a handful of meta-commands I do call directly:</p>
rtk gain              # token-savings analytics (the table above)
rtk gain --history    # per-command history with savings
rtk discover          # mine Claude Code history for missed opportunities
rtk proxy <cmd>       # run a command raw, no filtering (escape hatch)
</code></pre>
Where the savings actually come from</h2>
Not every command saves the same amount, and that's the interesting part. Breaking down my top commands by impact:</p>
  #  Command                   Count   Saved    Avg%
 1.  rtk cargo test --work…        7   10.8M  100.0%
 2.  rtk aws lambda list-f…        9    3.1M   22.2%
 3.  rtk read                    602    2.7M   18.0%
 4.  rtk git push origin …         1    1.8M  100.0%
 5.  rtk cargo test --work…        1    1.8M  100.0%
 6.  rtk git push -u origi…        1    1.7M  100.0%
 7.  rtk aws cloudformatio…        2  324.0K   50.0%
 8.  rtk:toml ps -ef               6  298.1K   98.7%
 9.  rtk:toml ps aux               5  288.8K   98.5%
10.  rtk gh pr diff               18  110.7K   48.1%
</code></pre>
Three distinct categories show up here:</p>
Full suppression (100%).</strong> cargo test --workspace</code> and git push</code> produce enormous progress streams — compiler chatter, per-test lines, transfer counters — and the only thing the model needs is did it pass</em> or what failed</em>. When everything succeeds, the right answer is a one-line summary, and the raw 10.8M tokens of test output never enter context. That single command class is my biggest win by a mile.</p>
Reshaping (98%).</strong> ps -ef</code> and ps aux</code> are wide, repetitive tables. Run them through RTK's structured :toml</code> output mode and you get a compact, parseable representation — 98%+ smaller, and easier</em> for the model to reason about than a column-aligned ASCII table.</p>
Trimming (18–50%).</strong> read</code>, gh pr diff</code>, and aws</code> calls don't get gutted — the content matters — but there's still 18–50% of pure formatting cruft to shave. Note read</code> ran 602 times</strong>: small per-call savings on a high-frequency command adds up to 2.7M tokens.</p>
The lesson I keep relearning: optimize the things you do constantly (read</code>) and</em> the things that dump enormous one-shot payloads (cargo test</code>). The middle is where most of the volume hides.</p>
Structured output with :toml</code> mode</h2>
The :toml</code> variants above aren't a gimmick. ASCII tables are optimized for human eyes</em> — alignment, separators, headers repeated for readability. A model doesn't need any of that; it needs keys and values. Emitting ps aux</code> as compact TOML drops the token count by ~98% and</strong> removes the ambiguity of parsing whitespace-aligned columns. Cheaper and more reliable at the same time is a rare trade to win.</p>
Finding what you're missing: rtk discover</code></h2>
The one I reach for when I want to tune things is rtk discover</code>. It reads back through my Claude Code history and flags commands that burned tokens but aren't being proxied yet — the long tail of "oh, I run that</em> a lot." It turns token optimization into a feedback loop instead of a guessing game: ship a filter, run for a week, ask discover</code> what's still expensive, repeat.</p>
When not</em> to filter</h2>
Filtering is lossy by definition, and sometimes I genuinely need the raw bytes — debugging a tool whose exact output format is the thing I'm investigating, or chasing a bug in a filter itself</em>. That's what rtk proxy <cmd></code> is for: run it completely untouched. Having a clean escape hatch is what makes aggressive default filtering safe. If the trim ever hides something I needed, I'm one command away from the truth.</p>
One footgun worth flagging: there's a name collision out there. If rtk gain</code> erroring with "command not found," you've probably got reachingforthejack/rtk</code> (a Rust Type Kit) on your PATH</code> instead. which rtk</code> sorts it out.</p>
What I learned</h2>
I went in thinking of this as a cost optimization, and it is — but the bigger payoff turned out to be context window hygiene</strong>. Tokens are cheap-ish; context is scarce</em>. Every line of noise I strip is a line that isn't crowding out something the model actually needs to remember three turns from now. The 90% I'm not paying for is nice. The 90% that isn't diluting the agent's attention is the real win.</p>
Next up: smarter, per-command filter profiles (a cargo test</code> filter shouldn't look anything like an aws</code> filter), and packaging RTK so it's a one-line install for anyone else running agents against a noisy CLI. If that's you, the output tax is real and you're almost certainly paying it. It's very killable.</p>
— Parker Jones, parkerjones.dev</a></em></p>

Building a Second Brain for Engineers: an LLM Capture-and-Synthesis Pipeline

2026-06-26T00:00:00+00:00

As an engineer, the context you need to do your job is scattered across a dozen systems: decisions made in chat threads, the why behind a change buried in a pull request, design rationale in a wiki page, a commitment made in a meeting nobody wrote down. Most of it decays. Six months later you're re-deriving a decision you already made because the only record was a Slack thread that scrolled into oblivion.

I built a pipeline to fix that — an LLM-assisted system that captures raw material from those systems and synthesizes it into a durable, searchable knowledge base. This post is the design, generalized. I'm deliberately keeping it tool- and employer-neutral; the value is in the architecture, which ports to any stack.

The three-layer model</h2>
The system has three layers, and keeping them separate is the whole game:
Layer 1: sources/ raw captures, one file per artifact, lightly structured Layer 2: wiki/ synthesized pages — the compressed, durable knowledge Layer 3: schema the protocol: conventions, frontmatter, ingest rules </code></pre> Sources are raw. A captured chat thread, a PR snapshot, a meeting transcript — verbatim, with metadata, written to a file. Cheap and lossless.</li> The wiki is synthesized. A synthesis agent reads new sources and folds them into durable pages: a glossary entry, an incident postmortem, a reusable pattern. The wiki's job is compression, not mirroring.</li> The schema is the contract both layers obey — filename conventions, frontmatter shape, the ingest protocol. It changes rarely and deliberately.</li> </ul> The insight that made this work: capture and synthesis are different problems with different failure modes, so decouple them. Capture must be fast and reliable (you're stealing a moment to save something). Synthesis can be slow and batched (it runs later, reads everything new, thinks hard). Couple them and a slow synthesis step blocks you from capturing at all. Decouple them and capture is instant; synthesis happens on its own cadence and reads whatever has accumulated. What to capture, per source</h2> Every source type — chat, issue tracker, code host, docs/wiki, meeting transcripts, ad-hoc paste-ins — gets the same five-part treatment: Capture-worthy signals. What's actually worth keeping. For a chat platform: threads where a decision is announced ("let's go with…", "approved", "ship it"), threads where you're mentioned and haven't replied, high-engagement threads in channels you watch.</li> Noise to skip. Bot notifications. Your own status pings. Reaction-only activity. The stuff that would bury the signal.</li> Output shape. The frontmatter and filename the capture writes — so the synthesis layer can consume it without guessing.</li> Cadence. Daily sweep? On-demand? Weekly?</li> Quirks. The source-specific gotchas (pagination, permalink expiry, HTML-to-markdown fidelity).</li> </ol> A captured issue-tracker ticket, for instance, lands as structured frontmatter plus a body: --- source: issue-tracker key: PROJ-1234 captured: 2026-06-26T08:00:00-04:00 status: Released last_event: 2026-06-25T16:22:00-04:00 --- </code></pre> That last_event</code> field is the high-water mark: on the next sweep, the capturer fetches only events newer than it, instead of re-pulling the whole ticket. Capture mechanisms — and the trap everyone hits</h2> There are three mechanism families, and choosing the right one per source is most of the design: Mechanism</th> Best for</th> Trade-off</th></tr></thead> Skills (slash commands)</td> On-demand, mid-conversation capture</td> You have to remember to invoke them</td></tr> Hooks (event-driven)</td> Local events you control</td> Narrower than the name suggests — see below</td></tr> Scheduled tasks (periodic sweeps)</td> External-system polling</td> Latency up to one cadence; needs dedup state</td></tr> </tbody></table> Here's the trap, and it's worth stating loudly because it cost me an afternoon: AI-agent hooks fire on conversation events — the agent stopped, a tool is about to run, the user submitted a prompt — not on external events. There is no "hook on PR opened" or "hook on Slack message." If you want event-driven capture from an external system, the real path is something outside the agent's hook system entirely: a git post-merge</code> hook on your own machine, or a CI action that writes to a synced directory. Conflating "agent hook" with "webhook" sends you building something that can't exist. So the actual recommendation per source is mostly: scheduled sweep for external systems (poll daily, dedup against state), on-demand skill for user-initiated capture (/capture <url></code> when you decide something matters), and hooks only in the narrow spots where a local event is the trigger. Dedup and freshness via filename conventions</h2> State is the enemy of reliability, so I push dedup into filenames instead of a database. Two shapes: Time-stamped capture — <source>-<id>-<YYYY-MM-DD>.md</code> — for artifacts with no stable identity (a daily chat digest, an ad-hoc article). Each capture is its own file.</li> Stable-entity capture — <source>-<id>.md</code> — for things with a canonical identity, recaptured in place (a ticket, a PR, a wiki page).</li> </ul> Recapture semantics depend on the source's own model: Append-only, newest-first for sources with event timelines (tickets, PRs): each recapture prepends a dated section; old content stays.</li> Versioned-replace with diff preservation for edit-replace sources (wiki pages): replace the body, but keep the prior version under a ## Previous version</code> heading so the diff isn't lost.</li> </ul> The filename is the dedup key. Two captures of the same ticket can never land under two different names — which, before I imposed this, is exactly what happened. When does a raw source become a wiki page?</h2> Not everything earns synthesis. A source gets promoted to its own durable page when any of these holds: Recurrence — the concept shows up 3+ times across captures and notes. Recurring relevance means compression pays off.</li> Decision-of-record — it documents a choice that'll be referenced later (an architecture decision, an incident root cause). Decisions get a page on first capture; the value is findability.</li> Reusable pattern — it describes a technique that applies beyond its origin. A pattern extracted from one incident belongs in the wiki because it'll apply to the next one.</li> </ol> Everything else stays raw. A one-off thread that resolved itself, a ticket shipped without revisiting — those live in sources/</code> as a searchable archive but never clutter the synthesized layer. Synthesis over enumeration: one wiki page can cite ten sources. The wiki compresses; it doesn't mirror. The one rule you can't skip: sensitive content</h2> The moment you point automated capture at chat and meetings, you're one bad sweep away from archiving someone's DM, an HR conversation, or customer-confidential material. The rule has to be default-deny for the sensitive class: skip private channels, skip DMs, require explicit confirmation before persisting a meeting transcript. It's safer to under-capture and manually add than to over-capture and have to scrub. Design this in from line one, not after the first incident. Why this is worth the ceremony</h2> "Why this much machinery for personal notes?" is a fair question, and the honest answer is: because the alternative — capturing by hand, when you remember, in whatever tool is open — doesn't scale past a few weeks of good intentions. The pipeline's entire purpose is to make capture cheaper than not capturing, and to make synthesis happen whether or not you feel like it that day. The tools are interchangeable — your chat platform, your agent runner, your note format. The architecture is the durable part: raw capture decoupled from batched synthesis, dedup encoded in filenames, a promotion rule that keeps the synthesized layer small, and default-deny on anything sensitive. Build that, and the context you need stops decaying. — Parker Jones, parkerjones.dev</a> Shipping Claude Skills with Nix: a Reproducible Agent Toolkit Across My Fleet 2026-06-26T00:00:00+00:00 A coding agent is only as good as the procedures you hand it. Out of the box, a model improvises every task from scratch; give it a skill — a written procedure for "do TDD," "diagnose a hard bug," "turn this into issues" — and it stops guessing and starts following a method you trust. I've built up a collection of these, and once you have more than a handful the real problem isn't writing them, it's distribution: getting the same skills onto every machine I work from, version-pinned, without copy-pasting Markdown around. I solved that with Nix. Here's the setup. What a skill is</h2> My skills live in a repo (a fork of Matt Pocock's skills</code></a>, credit where it's due), organized into buckets — engineering</code>, qa</code>, productivity</code>, personal</code>, misc</code>. Each skill is a directory with a SKILL.md</code>: YAML frontmatter naming it and describing when to use it, then the procedure itself. --- name: diagnose description: Disciplined diagnosis loop for hard bugs and performance regressions. Reproduce → minimise → hypothesise → instrument → fix → regression-test. Use when user says "diagnose this" / "debug this"... --- # Diagnose ... </code></pre> The description</code> is load-bearing — it's what the agent matches against to decide whether the skill is relevant. A few I reach for constantly: tdd</code> (red-green-refactor), diagnose</code> (the loop above), to-prd</code> / to-issues</code> / triage</code> (turning a vague ask into tracked work), and write-a-skill</code> (the skill that writes more skills). Small, composable, model-agnostic. No framework owning my process — just procedures I can read, edit, and trust. Three ways to install them</h2> The repo supports three distribution paths, in increasing order of how much I actually rely on them. 1. npx</code>, for anyone. The zero-commitment path: npx skills@latest add parallaxisjones/skills </code></pre> Pick the skills and agents you want, and you're set. Great for trying them on a machine that isn't mine. 2. A symlink script, for a local clone. If I've cloned the repo, link-skills.sh</code> symlinks every SKILL.md</code> into ~/.claude/skills/</code> so the CLI picks them up directly. It's idempotent — re-run after pulling — and it specifically guards against the footgun where ~/.claude/skills</code> is itself a symlink back into the repo, which would write per-skill symlinks into my own working copy: if [ -L "$DEST" ]; then resolved="$(readlink -f "$DEST")" case "$resolved" in "$REPO"/*) echo "refusing to pollute the repo"; exit 1 ;; esac fi </code></pre> That defensive check is the kind of thing you write after the first time a script eats its own tail. 3. Nix, for my actual fleet. This is the one that matters. My skills repo is a flake input in my system config</a>, and a home-manager module materializes them declaratively on every machine. The Nix wiring</h2> Two inputs do the work — my skills repo, and agent-skills-nix</code></a>, a home-manager module that knows how to turn a skills repo into materialized files: # flake.nix inputs = { agent-skills-nix = { url = "github:Kyure-A/agent-skills-nix"; inputs.nixpkgs.follows = "nixpkgs"; inputs.home-manager.follows = "home-manager"; }; my-skills.url = "github:parallaxisjones/skills"; }; </code></pre> Then in home-manager I point the module at my repo, filter to the buckets I want live, and enable everything: agent-skills = { enable = true; sources.mine = { input = "my-skills"; subdir = "skills"; filter.nameRegex = "^(engineering|misc|personal|productivity)/.*"; }; skills.enableAll = true; targets.claude.enable = true; }; </code></pre> That's the whole thing. On nixos-rebuild switch</code> (or darwin-rebuild</code> on the Mac), every SKILL.md</code> matching the regex gets written into Claude's skills directory. The deprecated/</code> bucket is excluded by the filter, so retiring a skill is a one-line regex change, not a manual delete on five machines. Why bother — what Nix actually buys here</h2> You could argue the symlink script is simpler, and for one machine it is. The fleet is where Nix earns it: Pinning. flake.lock</code> records the exact skills commit each machine is on. "Which version of my triage</code> skill is the laptop running?" has an answer, not a shrug.</li> Parity from scratch. A fresh machine reaches full skill parity as a side effect of building its system config. There's no separate "and don't forget to install your skills" step — it's the same switch</code> that installs my shell and my packages.</li> Atomic rollback. If a skill edit makes the agent behave worse, rolling back the generation rolls back the skills with it.</li> </ul> There's a framing I lean on for deciding what to manage this way — think of three zones. Zone 1 is stable, declarative config (the home-manager module enabling skills). Zone 3 is runtime state that should never touch Nix (per-conversation agent memory). Skills sit in Zone 2: authored content, edited as plain files in their repo, but materialized onto each machine by Nix. Nix owns where they land and which version; I own what they say. The honest caveat</h2> Nix doesn't validate skill content — a SKILL.md</code> with a bad description</code> will deploy just as reliably as a good one. This pipeline guarantees distribution and pinning, not quality. The quality comes from treating skills like code: review them, iterate on the descriptions when the agent picks the wrong one, and delete the ones that stop earning their place (that's what deprecated/</code> is for). But that's the right division of labor. Writing a good procedure is human work. Making sure that procedure is identically present on every machine I touch is exactly the kind of toil Nix exists to kill. Treat your agent's skills as part of your declarative system, not as dotfiles you sync by hand. — Parker Jones, parkerjones.dev</a>

Parker Jones Dev Blog - claude-code

RTK: Cutting My Claude Code Token Bill by 90% with a Rust Proxy

Building a Second Brain for Engineers: an LLM Capture-and-Synthesis Pipeline

Shipping Claude Skills with Nix: a Reproducible Agent Toolkit Across My Fleet