<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
    <title>Parker Jones Dev Blog - systems-design</title>
    <subtitle>Dev Blog of Parker Jones</subtitle>
    <link rel="self" type="application/atom+xml" href="https://parkerjones.dev/tags/systems-design/atom.xml"/>
    <link rel="alternate" type="text/html" href="https://parkerjones.dev"/>
    <generator uri="https://www.getzola.org/">Zola</generator>
    <updated>2026-06-26T00:00:00+00:00</updated>
    <id>https://parkerjones.dev/tags/systems-design/atom.xml</id>
    <entry xml:lang="en">
        <title>Building a Second Brain for Engineers: an LLM Capture-and-Synthesis Pipeline</title>
        <published>2026-06-26T00:00:00+00:00</published>
        <updated>2026-06-26T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://parkerjones.dev/posts/second-brain-llm-pipeline/"/>
        <id>https://parkerjones.dev/posts/second-brain-llm-pipeline/</id>
        
        <content type="html" xml:base="https://parkerjones.dev/posts/second-brain-llm-pipeline/">&lt;p&gt;As an engineer, the context you need to do your job is scattered across a dozen systems: decisions made in chat threads, the &lt;em&gt;why&lt;&#x2F;em&gt; behind a change buried in a pull request, design rationale in a wiki page, a commitment made in a meeting nobody wrote down. Most of it decays. Six months later you&#x27;re re-deriving a decision you already made because the only record was a Slack thread that scrolled into oblivion.&lt;&#x2F;p&gt;
&lt;p&gt;I built a pipeline to fix that — an LLM-assisted system that &lt;em&gt;captures&lt;&#x2F;em&gt; raw material from those systems and &lt;em&gt;synthesizes&lt;&#x2F;em&gt; it into a durable, searchable knowledge base. This post is the design, generalized. I&#x27;m deliberately keeping it tool- and employer-neutral; the value is in the architecture, which ports to any stack.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-three-layer-model&quot;&gt;The three-layer model&lt;&#x2F;h2&gt;
&lt;p&gt;The system has three layers, and keeping them separate is the whole game:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code data-lang=&quot;text&quot;&gt;  Layer 1: sources&#x2F;    raw captures, one file per artifact, lightly structured
  Layer 2: wiki&#x2F;       synthesized pages — the compressed, durable knowledge
  Layer 3: schema      the protocol: conventions, frontmatter, ingest rules
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Sources&lt;&#x2F;strong&gt; are raw. A captured chat thread, a PR snapshot, a meeting transcript — verbatim, with metadata, written to a file. Cheap and lossless.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;The wiki&lt;&#x2F;strong&gt; is synthesized. A synthesis agent reads new sources and folds them into durable pages: a glossary entry, an incident postmortem, a reusable pattern. The wiki&#x27;s job is &lt;em&gt;compression&lt;&#x2F;em&gt;, not mirroring.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;The schema&lt;&#x2F;strong&gt; is the contract both layers obey — filename conventions, frontmatter shape, the ingest protocol. It changes rarely and deliberately.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The insight that made this work: &lt;strong&gt;capture and synthesis are different problems with different failure modes, so decouple them.&lt;&#x2F;strong&gt; Capture must be fast and reliable (you&#x27;re stealing a moment to save something). Synthesis can be slow and batched (it runs later, reads everything new, thinks hard). Couple them and a slow synthesis step blocks you from capturing at all. Decouple them and capture is instant; synthesis happens on its own cadence and reads whatever has accumulated.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-to-capture-per-source&quot;&gt;What to capture, per source&lt;&#x2F;h2&gt;
&lt;p&gt;Every source type — chat, issue tracker, code host, docs&#x2F;wiki, meeting transcripts, ad-hoc paste-ins — gets the same five-part treatment:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Capture-worthy signals.&lt;&#x2F;strong&gt; What&#x27;s actually worth keeping. For a chat platform: threads where a decision is announced (&quot;let&#x27;s go with…&quot;, &quot;approved&quot;, &quot;ship it&quot;), threads where you&#x27;re mentioned and haven&#x27;t replied, high-engagement threads in channels you watch.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Noise to skip.&lt;&#x2F;strong&gt; Bot notifications. Your own status pings. Reaction-only activity. The stuff that would bury the signal.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Output shape.&lt;&#x2F;strong&gt; The frontmatter and filename the capture writes — so the synthesis layer can consume it without guessing.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Cadence.&lt;&#x2F;strong&gt; Daily sweep? On-demand? Weekly?&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Quirks.&lt;&#x2F;strong&gt; The source-specific gotchas (pagination, permalink expiry, HTML-to-markdown fidelity).&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;A captured issue-tracker ticket, for instance, lands as structured frontmatter plus a body:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code data-lang=&quot;yaml&quot;&gt;---
source: issue-tracker
key: PROJ-1234
captured: 2026-06-26T08:00:00-04:00
status: Released
last_event: 2026-06-25T16:22:00-04:00
---
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That &lt;code&gt;last_event&lt;&#x2F;code&gt; field is the high-water mark: on the next sweep, the capturer fetches only events newer than it, instead of re-pulling the whole ticket.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;capture-mechanisms-and-the-trap-everyone-hits&quot;&gt;Capture mechanisms — and the trap everyone hits&lt;&#x2F;h2&gt;
&lt;p&gt;There are three mechanism families, and choosing the right one per source is most of the design:&lt;&#x2F;p&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Mechanism&lt;&#x2F;th&gt;&lt;th&gt;Best for&lt;&#x2F;th&gt;&lt;th&gt;Trade-off&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Skills&lt;&#x2F;strong&gt; (slash commands)&lt;&#x2F;td&gt;&lt;td&gt;On-demand, mid-conversation capture&lt;&#x2F;td&gt;&lt;td&gt;You have to remember to invoke them&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Hooks&lt;&#x2F;strong&gt; (event-driven)&lt;&#x2F;td&gt;&lt;td&gt;Local events you control&lt;&#x2F;td&gt;&lt;td&gt;Narrower than the name suggests — see below&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Scheduled tasks&lt;&#x2F;strong&gt; (periodic sweeps)&lt;&#x2F;td&gt;&lt;td&gt;External-system polling&lt;&#x2F;td&gt;&lt;td&gt;Latency up to one cadence; needs dedup state&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;p&gt;Here&#x27;s the trap, and it&#x27;s worth stating loudly because it cost me an afternoon: &lt;strong&gt;AI-agent hooks fire on &lt;em&gt;conversation&lt;&#x2F;em&gt; events&lt;&#x2F;strong&gt; — the agent stopped, a tool is about to run, the user submitted a prompt — &lt;strong&gt;not on external events.&lt;&#x2F;strong&gt; There is no &quot;hook on PR opened&quot; or &quot;hook on Slack message.&quot; If you want event-driven capture from an external system, the real path is something outside the agent&#x27;s hook system entirely: a git &lt;code&gt;post-merge&lt;&#x2F;code&gt; hook on your own machine, or a CI action that writes to a synced directory. Conflating &quot;agent hook&quot; with &quot;webhook&quot; sends you building something that can&#x27;t exist.&lt;&#x2F;p&gt;
&lt;p&gt;So the actual recommendation per source is mostly: &lt;strong&gt;scheduled sweep for external systems&lt;&#x2F;strong&gt; (poll daily, dedup against state), &lt;strong&gt;on-demand skill for user-initiated capture&lt;&#x2F;strong&gt; (&lt;code&gt;&#x2F;capture &amp;lt;url&amp;gt;&lt;&#x2F;code&gt; when you decide something matters), and hooks only in the narrow spots where a &lt;em&gt;local&lt;&#x2F;em&gt; event is the trigger.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;dedup-and-freshness-via-filename-conventions&quot;&gt;Dedup and freshness via filename conventions&lt;&#x2F;h2&gt;
&lt;p&gt;State is the enemy of reliability, so I push dedup into filenames instead of a database. Two shapes:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Time-stamped capture&lt;&#x2F;strong&gt; — &lt;code&gt;&amp;lt;source&amp;gt;-&amp;lt;id&amp;gt;-&amp;lt;YYYY-MM-DD&amp;gt;.md&lt;&#x2F;code&gt; — for artifacts with no stable identity (a daily chat digest, an ad-hoc article). Each capture is its own file.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Stable-entity capture&lt;&#x2F;strong&gt; — &lt;code&gt;&amp;lt;source&amp;gt;-&amp;lt;id&amp;gt;.md&lt;&#x2F;code&gt; — for things with a canonical identity, recaptured &lt;em&gt;in place&lt;&#x2F;em&gt; (a ticket, a PR, a wiki page).&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Recapture semantics depend on the source&#x27;s own model:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Append-only, newest-first&lt;&#x2F;strong&gt; for sources with event timelines (tickets, PRs): each recapture prepends a dated section; old content stays.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Versioned-replace with diff preservation&lt;&#x2F;strong&gt; for edit-replace sources (wiki pages): replace the body, but keep the prior version under a &lt;code&gt;## Previous version&lt;&#x2F;code&gt; heading so the diff isn&#x27;t lost.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The filename &lt;em&gt;is&lt;&#x2F;em&gt; the dedup key. Two captures of the same ticket can never land under two different names — which, before I imposed this, is exactly what happened.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;when-does-a-raw-source-become-a-wiki-page&quot;&gt;When does a raw source become a wiki page?&lt;&#x2F;h2&gt;
&lt;p&gt;Not everything earns synthesis. A source gets promoted to its own durable page when &lt;strong&gt;any&lt;&#x2F;strong&gt; of these holds:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Recurrence&lt;&#x2F;strong&gt; — the concept shows up 3+ times across captures and notes. Recurring relevance means compression pays off.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Decision-of-record&lt;&#x2F;strong&gt; — it documents a choice that&#x27;ll be referenced later (an architecture decision, an incident root cause). Decisions get a page on first capture; the value is findability.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Reusable pattern&lt;&#x2F;strong&gt; — it describes a technique that applies beyond its origin. A pattern extracted from one incident belongs in the wiki because it&#x27;ll apply to the next one.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;Everything else stays raw. A one-off thread that resolved itself, a ticket shipped without revisiting — those live in &lt;code&gt;sources&#x2F;&lt;&#x2F;code&gt; as a searchable archive but never clutter the synthesized layer. &lt;strong&gt;Synthesis over enumeration:&lt;&#x2F;strong&gt; one wiki page can cite ten sources. The wiki compresses; it doesn&#x27;t mirror.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-one-rule-you-can-t-skip-sensitive-content&quot;&gt;The one rule you can&#x27;t skip: sensitive content&lt;&#x2F;h2&gt;
&lt;p&gt;The moment you point automated capture at chat and meetings, you&#x27;re one bad sweep away from archiving someone&#x27;s DM, an HR conversation, or customer-confidential material. The rule has to be &lt;strong&gt;default-deny for the sensitive class&lt;&#x2F;strong&gt;: skip private channels, skip DMs, require explicit confirmation before persisting a meeting transcript. It&#x27;s safer to under-capture and manually add than to over-capture and have to scrub. Design this in from line one, not after the first incident.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-this-is-worth-the-ceremony&quot;&gt;Why this is worth the ceremony&lt;&#x2F;h2&gt;
&lt;p&gt;&quot;Why this much machinery for personal notes?&quot; is a fair question, and the honest answer is: because the alternative — capturing by hand, when you remember, in whatever tool is open — doesn&#x27;t scale past a few weeks of good intentions. The pipeline&#x27;s entire purpose is to make &lt;em&gt;capture cheaper than not capturing&lt;&#x2F;em&gt;, and to make synthesis happen whether or not you feel like it that day.&lt;&#x2F;p&gt;
&lt;p&gt;The tools are interchangeable — your chat platform, your agent runner, your note format. The architecture is the durable part: &lt;strong&gt;raw capture decoupled from batched synthesis, dedup encoded in filenames, a promotion rule that keeps the synthesized layer small, and default-deny on anything sensitive.&lt;&#x2F;strong&gt; Build that, and the context you need stops decaying.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;— Parker Jones, &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;parkerjones.dev&quot;&gt;parkerjones.dev&lt;&#x2F;a&gt;&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
</content>
        
    </entry>
</feed>
