<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
    <title>Parker Jones Dev Blog - chainlit</title>
    <subtitle>Dev Blog of Parker Jones</subtitle>
    <link rel="self" type="application/atom+xml" href="https://parkerjones.dev/tags/chainlit/atom.xml"/>
    <link rel="alternate" type="text/html" href="https://parkerjones.dev"/>
    <generator uri="https://www.getzola.org/">Zola</generator>
    <updated>2025-02-07T00:00:00+00:00</updated>
    <id>https://parkerjones.dev/tags/chainlit/atom.xml</id>
    <entry xml:lang="en">
        <title>Reproducible, Secret-Safe AI Agents with Nix Flakes, agenix, and Magentic-One</title>
        <published>2025-02-07T00:00:00+00:00</published>
        <updated>2025-02-07T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://parkerjones.dev/posts/lab-agency/"/>
        <id>https://parkerjones.dev/posts/lab-agency/</id>
        
        <content type="html" xml:base="https://parkerjones.dev/posts/lab-agency/">&lt;p&gt;Most &quot;I built an AI agent&quot; posts skip the unglamorous parts: how the environment gets set up identically on another machine, where the API key actually lives, and what breaks when you try to make it reproducible. Those are the parts I find interesting, so this post is about the &lt;em&gt;plumbing&lt;&#x2F;em&gt; of &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;parallax-labs&#x2F;lab-agency&quot;&gt;Lab Agency&lt;&#x2F;a&gt; — a multi-agent app wired together with Microsoft&#x27;s &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.microsoft.com&#x2F;en-us&#x2F;research&#x2F;articles&#x2F;magentic-one-a-generalist-multi-agent-system-for-solving-complex-tasks&#x2F;&quot;&gt;Magentic-One&lt;&#x2F;a&gt; — built on a Nix flake with secrets managed by &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;ryantm&#x2F;agenix&quot;&gt;agenix&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;The agents are the fun part. Getting them to start the same way every time, with a decrypted key and the right Python deps, is the part that actually took the work.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-flake-one-entry-point-several-jobs&quot;&gt;The flake: one entry point, several jobs&lt;&#x2F;h2&gt;
&lt;p&gt;The whole project is defined by a &lt;code&gt;flake.nix&lt;&#x2F;code&gt; with a few inputs and a &lt;code&gt;flake-utils&lt;&#x2F;code&gt; wrapper so it builds across systems:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code data-lang=&quot;nix&quot;&gt;inputs = {
  nixpkgs.url    = &amp;quot;github:NixOS&#x2F;nixpkgs&#x2F;nixos-unstable&amp;quot;;
  flake-utils.url = &amp;quot;github:numtide&#x2F;flake-utils&amp;quot;;
  agenix.url     = &amp;quot;github:ryantm&#x2F;agenix&amp;quot;;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;From there it exposes four targets, each a different way into the same project:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;devShell&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; — an interactive shell for hacking on the code.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;run-surfer&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; (default) — boots the Chainlit app.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;inspect-embeddings&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; — pokes at the vector store.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;index-documents&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; — loads documents into the knowledge base.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The point of doing it this way is that &quot;clone and run&quot; is &lt;em&gt;true&lt;&#x2F;em&gt;. There&#x27;s no README step that says &quot;first, set these five environment variables and install these packages.&quot; The flake target is the setup.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;secrets-that-decrypt-themselves-in-the-shell&quot;&gt;Secrets that decrypt themselves, in the shell&lt;&#x2F;h2&gt;
&lt;p&gt;The piece I&#x27;m proudest of is how the OpenAI key is handled: it&#x27;s never on disk in plaintext and never pasted into a shell. It lives age-encrypted at &lt;code&gt;secrets&#x2F;openai.txt&lt;&#x2F;code&gt;, and the flake decrypts it into the environment as the shell starts up:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code data-lang=&quot;nix&quot;&gt;devShell = pkgs.mkShell {
  buildInputs = [ agenixCli pkgs.python3 ];
  shellHook = &amp;#39;&amp;#39;
    echo &amp;quot;Decrypting OpenAI secret...&amp;quot;
    export OPENAI_API_KEY=$(agenix --decrypt secrets&#x2F;openai.txt \
      --identity ~&#x2F;.ssh&#x2F;parallaxis)

    if [ ! -d venv ]; then
      python3 -m venv venv
      source venv&#x2F;bin&#x2F;activate
      pip install --upgrade pip
      pip install -r requirements.txt
    else
      source venv&#x2F;bin&#x2F;activate
    fi
  &amp;#39;&amp;#39;;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;code&gt;agenixCli&lt;&#x2F;code&gt; here is just &lt;code&gt;agenix.packages.${system}.default&lt;&#x2F;code&gt; pulled from the input. Decryption keys off my SSH identity (&lt;code&gt;~&#x2F;.ssh&#x2F;parallaxis&lt;&#x2F;code&gt;), so the encrypted secret can sit in the repo and only someone holding the right key can read it. The encrypted file is safe to commit; the key never is.&lt;&#x2F;p&gt;
&lt;p&gt;The &lt;code&gt;run-surfer&lt;&#x2F;code&gt; target repeats the same decrypt-and-venv dance inside a &lt;code&gt;writeShellScriptBin&lt;&#x2F;code&gt; so the app launches with one command:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code data-lang=&quot;nix&quot;&gt;runSurfer = pkgs.writeShellScriptBin &amp;quot;run-surfer&amp;quot; &amp;#39;&amp;#39;
  # ...ensure venv, then:
  export OPENAI_API_KEY=$(agenix --decrypt secrets&#x2F;openai.txt \
    --identity ~&#x2F;.ssh&#x2F;parallaxis)
  exec chainlit run clapp.py -w
&amp;#39;&amp;#39;;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;strong&gt;An honest caveat:&lt;&#x2F;strong&gt; there&#x27;s a Python &lt;code&gt;venv&lt;&#x2F;code&gt; living &lt;em&gt;inside&lt;&#x2F;em&gt; a Nix flake here, which is not pure Nix and a purist would wince. I made that trade deliberately — the AI ecosystem moves fast and pinning everything through nixpkgs would mean fighting the toolchain instead of building the product. Nix gives me a reproducible &lt;em&gt;shell&lt;&#x2F;em&gt; (right Python, right &lt;code&gt;agenix&lt;&#x2F;code&gt;, right secret); &lt;code&gt;pip install -r requirements.txt&lt;&#x2F;code&gt; handles the fast-moving libraries. It&#x27;s a pragmatic seam, not a principled one, and I&#x27;d reconsider it the moment the dependency set stabilizes.&lt;&#x2F;p&gt;
&lt;p&gt;Magentic-One&#x27;s CLI does get the full Nix treatment, though — it&#x27;s packaged in its own &lt;code&gt;autogen-flake&#x2F;magentic-one-cli.nix&lt;&#x2F;code&gt; rather than left to pip.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-agents-a-magentic-one-team-plus-a-rag-layer&quot;&gt;The agents: a Magentic-One team plus a RAG layer&lt;&#x2F;h2&gt;
&lt;p&gt;The app itself (&lt;code&gt;clapp.py&lt;&#x2F;code&gt;) is a &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;chainlit.io&#x2F;&quot;&gt;Chainlit&lt;&#x2F;a&gt; chat front-end over a Magentic-One group chat. Magentic-One ships a roster of specialist agents, and Lab Agency assembles them through AutoGen&#x27;s extensions:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code data-lang=&quot;python&quot;&gt;from autogen_agentchat.teams import MagenticOneGroupChat
from autogen_ext.agents.file_surfer import FileSurfer
from autogen_ext.agents.web_surfer import MultimodalWebSurfer
from autogen_ext.agents.magentic_one import MagenticOneCoderAgent
from autogen_ext.code_executors.local import LocalCommandLineCodeExecutor
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;So the team can read files (&lt;code&gt;FileSurfer&lt;&#x2F;code&gt;), browse the web (&lt;code&gt;MultimodalWebSurfer&lt;&#x2F;code&gt;), write code (&lt;code&gt;MagenticOneCoderAgent&lt;&#x2F;code&gt;), and execute it locally (&lt;code&gt;LocalCommandLineCodeExecutor&lt;&#x2F;code&gt;) — Magentic-One&#x27;s orchestrator decides who does what for a given task.&lt;&#x2F;p&gt;
&lt;p&gt;On top of that roster I added two custom agents to give the team a memory. A &lt;strong&gt;&lt;code&gt;KnowledgeAgent&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; does retrieval-augmented generation against a &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.trychroma.com&#x2F;&quot;&gt;Chroma&lt;&#x2F;a&gt; collection:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code data-lang=&quot;python&quot;&gt;async def retrieve_knowledge(self, query):
    query_embedding = self.embedding_function([query])[0]
    results = self.collection.query(
        query_embeddings=[query_embedding],
        n_results=5,
    )
    return results[&amp;#39;documents&amp;#39;][0]
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;…and an &lt;strong&gt;&lt;code&gt;EmbeddingAgent&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; writes back to it, so research the team produces during a session can be folded into the knowledge base for the next one:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code data-lang=&quot;python&quot;&gt;async def add_to_knowledge_base(self, content):
    embedding = self.embedding_function([content])[0]
    doc_id = f&amp;quot;doc_{self.collection.count()}&amp;quot;
    self.collection.add(documents=[content], ids=[doc_id],
                        embeddings=[embedding])
    return doc_id
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The store itself is just a local &lt;code&gt;chroma_db&#x2F;chroma.sqlite3&lt;&#x2F;code&gt; — no managed vector database, no extra service to stand up. That keeps the whole thing runnable on a laptop, which was the goal.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;loading-the-knowledge-base&quot;&gt;Loading the knowledge base&lt;&#x2F;h2&gt;
&lt;p&gt;The &lt;code&gt;index-documents&lt;&#x2F;code&gt; target runs a standalone script that chunks documents by token count using &lt;code&gt;tiktoken&lt;&#x2F;code&gt;, embeds them through &lt;code&gt;AsyncOpenAI&lt;&#x2F;code&gt;, and stores them in the &lt;code&gt;agent_knowledge_base&lt;&#x2F;code&gt; Chroma collection. Chunking on tokens rather than characters matters here — it&#x27;s what keeps each chunk inside the embedding model&#x27;s window instead of getting silently truncated, which is the kind of bug that doesn&#x27;t error, it just quietly makes your retrieval worse.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-i-d-carry-forward&quot;&gt;What I&#x27;d carry forward&lt;&#x2F;h2&gt;
&lt;p&gt;Stepping back, the parts of this project I&#x27;d reuse on the next one aren&#x27;t the agents — those libraries will have moved on by next quarter. It&#x27;s the &lt;em&gt;scaffolding&lt;&#x2F;em&gt;:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;A flake target per task&lt;&#x2F;strong&gt; turns documentation into executable setup. &quot;How do I run this?&quot; has a literal command as the answer.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;agenix in a &lt;code&gt;shellHook&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; means a real API key is present in the environment without ever being plaintext on disk or in shell history. This is the pattern I&#x27;ll copy into everything.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Local Chroma over SQLite&lt;&#x2F;strong&gt; is enough vector store for a single-node agent app, and skipping the managed service kept the project laptop-runnable.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The honest roadmap item is the same one every project like this has: containerize it and put it somewhere it can run unattended. The flake makes that a smaller leap than it would otherwise be — the build is already declarative. The code, secrets-handling, and flake are all on GitHub: &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;parallax-labs&#x2F;lab-agency&quot;&gt;parallax-labs&#x2F;lab-agency&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;— Parker Jones, &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;parkerjones.dev&quot;&gt;parkerjones.dev&lt;&#x2F;a&gt;&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
</content>
        
    </entry>
</feed>
