<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
    <title>Parker Jones Dev Blog - local-first</title>
    <subtitle>Dev Blog of Parker Jones</subtitle>
    <link rel="self" type="application/atom+xml" href="https://parkerjones.dev/tags/local-first/atom.xml"/>
    <link rel="alternate" type="text/html" href="https://parkerjones.dev"/>
    <generator uri="https://www.getzola.org/">Zola</generator>
    <updated>2026-06-26T00:00:00+00:00</updated>
    <id>https://parkerjones.dev/tags/local-first/atom.xml</id>
    <entry xml:lang="en">
        <title>Context Harness: a Local-First RAG Engine in Rust with Lua Extensions and an MCP Server</title>
        <published>2026-06-26T00:00:00+00:00</published>
        <updated>2026-06-26T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://parkerjones.dev/posts/context-harness-rag-rust/"/>
        <id>https://parkerjones.dev/posts/context-harness-rag-rust/</id>
        
        <content type="html" xml:base="https://parkerjones.dev/posts/context-harness-rag-rust/">&lt;p&gt;AI tools are only useful when they can see &lt;em&gt;your&lt;&#x2F;em&gt; context — your docs, your code, your notes, the Hacker News thread you read last week. The usual answer is a cloud RAG service: ship your data to someone&#x27;s vector database, pay per query, hope it&#x27;s still up. I wanted the opposite — a single binary that ingests my stuff, indexes it locally, and hands it to any AI tool over a standard protocol, with no cloud dependency. So I built &lt;strong&gt;Context Harness&lt;&#x2F;strong&gt; (&lt;code&gt;ctx&lt;&#x2F;code&gt;).&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code data-lang=&quot;text&quot;&gt;$ ctx --help
Context Harness provides a connector-driven pipeline for ingesting documents
from multiple sources (filesystem, Git repositories, S3 buckets), chunking and
embedding them, and exposing hybrid search (keyword + semantic) via a CLI and
MCP-compatible HTTP server.
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;It&#x27;s a RAG engine that lives on your laptop. Here&#x27;s the design.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-pipeline&quot;&gt;The pipeline&lt;&#x2F;h2&gt;
&lt;p&gt;The flow is the standard RAG shape, but every stage is local and configurable through one &lt;code&gt;ctx.toml&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code data-lang=&quot;text&quot;&gt;connectors → sync → chunk → embed → SQLite → hybrid search → CLI &#x2F; MCP
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;strong&gt;Connectors&lt;&#x2F;strong&gt; define where documents come from — filesystem globs, Git repos, S3 buckets, or custom Lua scripts (more on those below). &lt;code&gt;ctx sync&lt;&#x2F;code&gt; pulls from a connector and runs the rest of the pipeline.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Chunking and embedding&lt;&#x2F;strong&gt; are configured, not hard-coded:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code data-lang=&quot;toml&quot;&gt;[chunking]
max_tokens = 700
overlap_tokens = 80

[embedding]
provider = &amp;quot;openai&amp;quot;
model = &amp;quot;text-embedding-3-small&amp;quot;
dims = 1536
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The &lt;code&gt;provider&lt;&#x2F;code&gt; is pluggable — that line can point at OpenAI&#x27;s &lt;code&gt;text-embedding-3-small&lt;&#x2F;code&gt;, or at a &lt;strong&gt;fully local&lt;&#x2F;strong&gt; model. My test setup runs &lt;code&gt;fastembed&lt;&#x2F;code&gt; with a quantized &lt;code&gt;all-MiniLM-L6-v2&lt;&#x2F;code&gt; ONNX model, so embeddings happen on-device with no API call at all. That choice is the whole &quot;local-first&quot; thesis in one config key: trade a little retrieval quality for zero cloud dependency and zero per-query cost.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Storage is SQLite.&lt;&#x2F;strong&gt; No vector-database service to stand up, no Docker, no daemon. The index is a file you can copy, back up, or delete. For a single-user knowledge base, a managed vector DB is wildly over-provisioned; SQLite is exactly right.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;hybrid-search&quot;&gt;Hybrid search&lt;&#x2F;h2&gt;
&lt;p&gt;Pure semantic search misses exact terms; pure keyword search misses paraphrase. &lt;code&gt;ctx&lt;&#x2F;code&gt; does both and blends them:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code data-lang=&quot;toml&quot;&gt;[retrieval]
final_limit = 12
hybrid_alpha = 0.6           # weight toward semantic vs keyword
candidate_k_keyword = 80     # pull 80 keyword candidates
candidate_k_vector  = 80     # and 80 vector candidates
group_by = &amp;quot;document&amp;quot;        # then dedup&#x2F;group by document
doc_agg = &amp;quot;max&amp;quot;
max_chunks_per_doc = 3
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;It gathers candidates from both a keyword index and a vector index, blends the scores with &lt;code&gt;hybrid_alpha&lt;&#x2F;code&gt;, groups by document so one long file can&#x27;t flood the results, and returns the top 12. Tuning &lt;code&gt;hybrid_alpha&lt;&#x2F;code&gt; toward 0 or 1 lets you dial between &quot;find the exact phrase&quot; and &quot;find the related idea.&quot;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-part-that-makes-it-an-engine-the-mcp-server&quot;&gt;The part that makes it an &lt;em&gt;engine&lt;&#x2F;em&gt;: the MCP server&lt;&#x2F;h2&gt;
&lt;p&gt;A search CLI is handy. A search CLI that any AI tool can call as a tool is a force multiplier:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code data-lang=&quot;text&quot;&gt;$ ctx serve mcp
# Exposes search&#x2F;get over a JSON API for Cursor, Claude, and other
# MCP-compatible AI tools.
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;modelcontextprotocol.io&#x2F;&quot;&gt;MCP&lt;&#x2F;a&gt; is the protocol AI tools use to call external tools. By speaking it, &lt;code&gt;ctx&lt;&#x2F;code&gt; turns your local knowledge base into a tool the model can reach for mid-conversation: &quot;search my notes for the retry-policy decision,&quot; and the model queries your SQLite index and gets grounded results — without your notes ever leaving the machine. That&#x27;s the feature I use every day.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;extensibility-connectors-tools-and-agents-in-lua&quot;&gt;Extensibility: connectors, tools, and agents in Lua&lt;&#x2F;h2&gt;
&lt;p&gt;Built-in connectors cover the common cases, but the interesting data is always somewhere weird. So &lt;code&gt;ctx&lt;&#x2F;code&gt; embeds Lua: you can script &lt;strong&gt;connectors&lt;&#x2F;strong&gt; (new data sources), &lt;strong&gt;tools&lt;&#x2F;strong&gt; (new capabilities), and &lt;strong&gt;agents&lt;&#x2F;strong&gt; (personas with a system prompt and a scoped toolset). Each has &lt;code&gt;init&lt;&#x2F;code&gt;&#x2F;&lt;code&gt;test&lt;&#x2F;code&gt; scaffolding:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code data-lang=&quot;text&quot;&gt;$ ctx connector init    # scaffold a new Lua connector from a template
$ ctx connector test    # run it without writing to the DB
$ ctx agent  init&#x2F;test&#x2F;list
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The agent system is the one I&#x27;m proudest of. An agent is a Lua script that, at resolve time, &lt;em&gt;assembles its own context&lt;&#x2F;em&gt; by querying the knowledge base, then hands the model a system prompt plus pre-loaded research and a scoped set of tools. Here&#x27;s a real one — &lt;code&gt;hn-writer&lt;&#x2F;code&gt;, which writes Hacker News launch posts:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code data-lang=&quot;lua&quot;&gt;agent = {
    name = &amp;quot;hn-writer&amp;quot;,
    description = &amp;quot;Write Hacker News posts by studying top HN content and your product docs&amp;quot;,
    tools = { &amp;quot;search&amp;quot;, &amp;quot;get&amp;quot; },          -- scoped: this agent can only search and fetch
    arguments = {
        { name = &amp;quot;style&amp;quot;, description = &amp;quot;show_hn, launch, ask_hn, or discussion&amp;quot; },
        { name = &amp;quot;angle&amp;quot;, description = &amp;quot;e.g. &amp;#39;local-first&amp;#39;, &amp;#39;developer tooling&amp;#39;&amp;quot; },
        { name = &amp;quot;tone&amp;quot;,  description = &amp;quot;technical, conversational, or minimal&amp;quot; },
    },
}

function agent.resolve(args, config, context)
    -- pre-load HN trends from the knowledge base
    for _, q in ipairs({ &amp;quot;Show HN&amp;quot;, &amp;quot;Rust CLI tool&amp;quot;, &amp;quot;local first&amp;quot;, &amp;quot;AI context&amp;quot; }) do
        local results = context.search(q, { mode = &amp;quot;keyword&amp;quot;, limit = 5 })
        -- ...fold the top results into the prompt as research...
    end
    -- ...also search the project&amp;#39;s own docs, then return:
    return { system = system_prompt, tools = { &amp;quot;search&amp;quot;, &amp;quot;get&amp;quot; }, messages = preloaded }
end
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;What I love about this pattern: the agent does its own retrieval &lt;em&gt;before&lt;&#x2F;em&gt; the model gets involved, so the model starts with both &quot;what performs well on HN right now&quot; (from a connector that ingests HN) and &quot;what this product actually does&quot; (from a filesystem connector over the docs) already in context. And there&#x27;s a pleasant recursion to it — I have an agent whose job is to write the Show HN post for the tool the agent runs on. Its prompt even encodes the house style: &lt;em&gt;&quot;What HN hates: marketing speak, buzzwords, superlatives… technical substance over marketing language.&quot;&lt;&#x2F;em&gt; Which, not coincidentally, is the ethos of this whole blog.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;sharing-extensions-registries&quot;&gt;Sharing extensions: registries&lt;&#x2F;h2&gt;
&lt;p&gt;Lua scripts are shareable, so &lt;code&gt;ctx&lt;&#x2F;code&gt; supports &lt;strong&gt;registries&lt;&#x2F;strong&gt; — git repos of community connectors, tools, and agents that sync in automatically:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code data-lang=&quot;toml&quot;&gt;[registries.community]
url = &amp;quot;https:&#x2F;&#x2F;github.com&#x2F;parallax-labs&#x2F;ctx-registry.git&amp;quot;
auto_update = true
readonly = true
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;A registry is just a versioned directory of &lt;code&gt;.lua&lt;&#x2F;code&gt; files; pointing at one makes its connectors and agents available locally. It&#x27;s the same &quot;distribute capability declaratively&quot; idea I use for &lt;a href=&quot;&#x2F;posts&#x2F;skills-with-nix&#x2F;&quot;&gt;agent skills&lt;&#x2F;a&gt;, applied to data connectors.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;static-site-search-for-free&quot;&gt;Static-site search, for free&lt;&#x2F;h2&gt;
&lt;p&gt;One more trick. &lt;code&gt;ctx export&lt;&#x2F;code&gt; dumps the whole index to JSON for client-side search:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code data-lang=&quot;text&quot;&gt;$ ctx export
# Exports documents and chunks to JSON for use with ctx-search.js —
# client-side search on a static site.
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Which means the same engine that grounds my AI tools could also power search on &lt;em&gt;this&lt;&#x2F;em&gt; blog — index the posts, export the JSON, search it in the browser with no backend. (My &lt;a href=&quot;&#x2F;posts&#x2F;consoler-dark-theme&#x2F;&quot;&gt;terminal theme already runs a database in the browser&lt;&#x2F;a&gt;, so this is a natural next step.)&lt;&#x2F;p&gt;
&lt;h2 id=&quot;honest-trade-offs&quot;&gt;Honest trade-offs&lt;&#x2F;h2&gt;
&lt;p&gt;Context Harness is young, and local-first is a set of trade-offs, not a free lunch:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Local embeddings are private and free but lower-quality&lt;&#x2F;strong&gt; than the big cloud models. &lt;code&gt;provider&lt;&#x2F;code&gt; lets you choose per use case, but you don&#x27;t get both at once.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;SQLite scales to a personal knowledge base, not a team&#x27;s corpus.&lt;&#x2F;strong&gt; That&#x27;s the design target, not a bug — but know the ceiling.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Lua is power and rope.&lt;&#x2F;strong&gt; Scriptable connectors mean I can ingest anything; they also mean a bad script can do bad things. &lt;code&gt;connector test&lt;&#x2F;code&gt; (which never writes to the DB) exists for exactly that reason.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;But the core bet has paid off: a single Rust binary, a SQLite file, optional fully-local embeddings, and a standard protocol is enough to give every AI tool I use grounded access to my own context — without renting a vector database to do it.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;— Parker Jones, &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;parkerjones.dev&quot;&gt;parkerjones.dev&lt;&#x2F;a&gt;&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
</content>
        
    </entry>
</feed>
