Context Harness: a Local-First RAG Engine in Rust with Lua Extensions and an MCP Server

2026-06-26T00:00:00+00:00
AI tools are only useful when they can see your</em> context — your docs, your code, your notes, the Hacker News thread you read last week. The usual answer is a cloud RAG service: ship your data to someone's vector database, pay per query, hope it's still up. I wanted the opposite — a single binary that ingests my stuff, indexes it locally, and hands it to any AI tool over a standard protocol, with no cloud dependency. So I built Context Harness</strong> (ctx</code>).</p>
$ ctx --help
Context Harness provides a connector-driven pipeline for ingesting documents
from multiple sources (filesystem, Git repositories, S3 buckets), chunking and
embedding them, and exposing hybrid search (keyword + semantic) via a CLI and
MCP-compatible HTTP server.
</code></pre>
It's a RAG engine that lives on your laptop. Here's the design.</p>
The pipeline</h2>
The flow is the standard RAG shape, but every stage is local and configurable through one ctx.toml</code>:</p>
connectors → sync → chunk → embed → SQLite → hybrid search → CLI / MCP
</code></pre>
Connectors</strong> define where documents come from — filesystem globs, Git repos, S3 buckets, or custom Lua scripts (more on those below). ctx sync</code> pulls from a connector and runs the rest of the pipeline.</p>
Chunking and embedding</strong> are configured, not hard-coded:</p>
[chunking]
max_tokens = 700
overlap_tokens = 80

[embedding]
provider = "openai"
model = "text-embedding-3-small"
dims = 1536
</code></pre>
The provider</code> is pluggable — that line can point at OpenAI's text-embedding-3-small</code>, or at a fully local</strong> model. My test setup runs fastembed</code> with a quantized all-MiniLM-L6-v2</code> ONNX model, so embeddings happen on-device with no API call at all. That choice is the whole "local-first" thesis in one config key: trade a little retrieval quality for zero cloud dependency and zero per-query cost.</p>
Storage is SQLite.</strong> No vector-database service to stand up, no Docker, no daemon. The index is a file you can copy, back up, or delete. For a single-user knowledge base, a managed vector DB is wildly over-provisioned; SQLite is exactly right.</p>
Hybrid search</h2>
Pure semantic search misses exact terms; pure keyword search misses paraphrase. ctx</code> does both and blends them:</p>
[retrieval]
final_limit = 12
hybrid_alpha = 0.6           # weight toward semantic vs keyword
candidate_k_keyword = 80     # pull 80 keyword candidates
candidate_k_vector  = 80     # and 80 vector candidates
group_by = "document"        # then dedup/group by document
doc_agg = "max"
max_chunks_per_doc = 3
</code></pre>
It gathers candidates from both a keyword index and a vector index, blends the scores with hybrid_alpha</code>, groups by document so one long file can't flood the results, and returns the top 12. Tuning hybrid_alpha</code> toward 0 or 1 lets you dial between "find the exact phrase" and "find the related idea."</p>
The part that makes it an engine</em>: the MCP server</h2>
A search CLI is handy. A search CLI that any AI tool can call as a tool is a force multiplier:</p>
$ ctx serve mcp
# Exposes search/get over a JSON API for Cursor, Claude, and other
# MCP-compatible AI tools.
</code></pre>
MCP</a> is the protocol AI tools use to call external tools. By speaking it, ctx</code> turns your local knowledge base into a tool the model can reach for mid-conversation: "search my notes for the retry-policy decision," and the model queries your SQLite index and gets grounded results — without your notes ever leaving the machine. That's the feature I use every day.</p>
Extensibility: connectors, tools, and agents in Lua</h2>
Built-in connectors cover the common cases, but the interesting data is always somewhere weird. So ctx</code> embeds Lua: you can script connectors</strong> (new data sources), tools</strong> (new capabilities), and agents</strong> (personas with a system prompt and a scoped toolset). Each has init</code>/test</code> scaffolding:</p>
$ ctx connector init    # scaffold a new Lua connector from a template
$ ctx connector test    # run it without writing to the DB
$ ctx agent  init/test/list
</code></pre>
The agent system is the one I'm proudest of. An agent is a Lua script that, at resolve time, assembles its own context</em> by querying the knowledge base, then hands the model a system prompt plus pre-loaded research and a scoped set of tools. Here's a real one — hn-writer</code>, which writes Hacker News launch posts:</p>
agent = {
    name = "hn-writer",
    description = "Write Hacker News posts by studying top HN content and your product docs",
    tools = { "search", "get" },          -- scoped: this agent can only search and fetch
    arguments = {
        { name = "style", description = "show_hn, launch, ask_hn, or discussion" },
        { name = "angle", description = "e.g. 'local-first', 'developer tooling'" },
        { name = "tone",  description = "technical, conversational, or minimal" },
    },
}

function agent.resolve(args, config, context)
    -- pre-load HN trends from the knowledge base
    for _, q in ipairs({ "Show HN", "Rust CLI tool", "local first", "AI context" }) do
        local results = context.search(q, { mode = "keyword", limit = 5 })
        -- ...fold the top results into the prompt as research...
    end
    -- ...also search the project's own docs, then return:
    return { system = system_prompt, tools = { "search", "get" }, messages = preloaded }
end
</code></pre>
What I love about this pattern: the agent does its own retrieval before</em> the model gets involved, so the model starts with both "what performs well on HN right now" (from a connector that ingests HN) and "what this product actually does" (from a filesystem connector over the docs) already in context. And there's a pleasant recursion to it — I have an agent whose job is to write the Show HN post for the tool the agent runs on. Its prompt even encodes the house style: "What HN hates: marketing speak, buzzwords, superlatives… technical substance over marketing language."</em> Which, not coincidentally, is the ethos of this whole blog.</p>
Sharing extensions: registries</h2>
Lua scripts are shareable, so ctx</code> supports registries</strong> — git repos of community connectors, tools, and agents that sync in automatically:</p>
[registries.community]
url = "https://github.com/parallax-labs/ctx-registry.git"
auto_update = true
readonly = true
</code></pre>
A registry is just a versioned directory of .lua</code> files; pointing at one makes its connectors and agents available locally. It's the same "distribute capability declaratively" idea I use for agent skills</a>, applied to data connectors.</p>
Static-site search, for free</h2>
One more trick. ctx export</code> dumps the whole index to JSON for client-side search:</p>
$ ctx export
# Exports documents and chunks to JSON for use with ctx-search.js —
# client-side search on a static site.
</code></pre>
Which means the same engine that grounds my AI tools could also power search on this</em> blog — index the posts, export the JSON, search it in the browser with no backend. (My terminal theme already runs a database in the browser</a>, so this is a natural next step.)</p>
Honest trade-offs</h2>
Context Harness is young, and local-first is a set of trade-offs, not a free lunch:</p>

Local embeddings are private and free but lower-quality</strong> than the big cloud models. provider</code> lets you choose per use case, but you don't get both at once.</li>
SQLite scales to a personal knowledge base, not a team's corpus.</strong> That's the design target, not a bug — but know the ceiling.</li>
Lua is power and rope.</strong> Scriptable connectors mean I can ingest anything; they also mean a bad script can do bad things. connector test</code> (which never writes to the DB) exists for exactly that reason.</li>
</ul>
But the core bet has paid off: a single Rust binary, a SQLite file, optional fully-local embeddings, and a standard protocol is enough to give every AI tool I use grounded access to my own context — without renting a vector database to do it.</p>
— Parker Jones, parkerjones.dev</a></em></p>
Parker Jones Dev Blog - local-first

Context Harness: a Local-First RAG Engine in Rust with Lua Extensions and an MCP Server