Reproducible, Secret-Safe AI Agents with Nix Flakes, agenix, and Magentic-One

2025-02-07T00:00:00+00:00

Most "I built an AI agent" posts skip the unglamorous parts: how the environment gets set up identically on another machine, where the API key actually lives, and what breaks when you try to make it reproducible. Those are the parts I find interesting, so this post is about the plumbing of Lab Agency</a> — a multi-agent app wired together with Microsoft's Magentic-One</a> — built on a Nix flake with secrets managed by agenix</a>.

The agents are the fun part. Getting them to start the same way every time, with a decrypted key and the right Python deps, is the part that actually took the work.

The flake: one entry point, several jobs</h2>
The whole project is defined by a flake.nix</code> with a few inputs and a flake-utils</code> wrapper so it builds across systems:
inputs = { nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable"; flake-utils.url = "github:numtide/flake-utils"; agenix.url = "github:ryantm/agenix"; }; </code></pre> From there it exposes four targets, each a different way into the same project:devShell</code> — an interactive shell for hacking on the code.</li> run-surfer</code> (default) — boots the Chainlit app.</li> inspect-embeddings</code> — pokes at the vector store.</li> index-documents</code> — loads documents into the knowledge base.</li> </ul> The point of doing it this way is that "clone and run" is true. There's no README step that says "first, set these five environment variables and install these packages." The flake target is the setup. Secrets that decrypt themselves, in the shell</h2> The piece I'm proudest of is how the OpenAI key is handled: it's never on disk in plaintext and never pasted into a shell. It lives age-encrypted at secrets/openai.txt</code>, and the flake decrypts it into the environment as the shell starts up: devShell = pkgs.mkShell { buildInputs = [ agenixCli pkgs.python3 ]; shellHook = '' echo "Decrypting OpenAI secret..." export OPENAI_API_KEY=$(agenix --decrypt secrets/openai.txt \ --identity ~/.ssh/parallaxis) if [ ! -d venv ]; then python3 -m venv venv source venv/bin/activate pip install --upgrade pip pip install -r requirements.txt else source venv/bin/activate fi ''; }; </code></pre> agenixCli</code> here is just agenix.packages.${system}.default</code> pulled from the input. Decryption keys off my SSH identity (~/.ssh/parallaxis</code>), so the encrypted secret can sit in the repo and only someone holding the right key can read it. The encrypted file is safe to commit; the key never is. The run-surfer</code> target repeats the same decrypt-and-venv dance inside a writeShellScriptBin</code> so the app launches with one command: runSurfer = pkgs.writeShellScriptBin "run-surfer" '' # ...ensure venv, then: export OPENAI_API_KEY=$(agenix --decrypt secrets/openai.txt \ --identity ~/.ssh/parallaxis) exec chainlit run clapp.py -w ''; </code></pre> An honest caveat: there's a Python venv</code> living inside a Nix flake here, which is not pure Nix and a purist would wince. I made that trade deliberately — the AI ecosystem moves fast and pinning everything through nixpkgs would mean fighting the toolchain instead of building the product. Nix gives me a reproducible shell (right Python, right agenix</code>, right secret); pip install -r requirements.txt</code> handles the fast-moving libraries. It's a pragmatic seam, not a principled one, and I'd reconsider it the moment the dependency set stabilizes. Magentic-One's CLI does get the full Nix treatment, though — it's packaged in its own autogen-flake/magentic-one-cli.nix</code> rather than left to pip. The agents: a Magentic-One team plus a RAG layer</h2> The app itself (clapp.py</code>) is a Chainlit</a> chat front-end over a Magentic-One group chat. Magentic-One ships a roster of specialist agents, and Lab Agency assembles them through AutoGen's extensions: from autogen_agentchat.teams import MagenticOneGroupChat from autogen_ext.agents.file_surfer import FileSurfer from autogen_ext.agents.web_surfer import MultimodalWebSurfer from autogen_ext.agents.magentic_one import MagenticOneCoderAgent from autogen_ext.code_executors.local import LocalCommandLineCodeExecutor </code></pre> So the team can read files (FileSurfer</code>), browse the web (MultimodalWebSurfer</code>), write code (MagenticOneCoderAgent</code>), and execute it locally (LocalCommandLineCodeExecutor</code>) — Magentic-One's orchestrator decides who does what for a given task. On top of that roster I added two custom agents to give the team a memory. A KnowledgeAgent</code> does retrieval-augmented generation against a Chroma</a> collection: async def retrieve_knowledge(self, query): query_embedding = self.embedding_function([query])[0] results = self.collection.query( query_embeddings=[query_embedding], n_results=5, ) return results['documents'][0] </code></pre> …and an EmbeddingAgent</code> writes back to it, so research the team produces during a session can be folded into the knowledge base for the next one: async def add_to_knowledge_base(self, content): embedding = self.embedding_function([content])[0] doc_id = f"doc_{self.collection.count()}" self.collection.add(documents=[content], ids=[doc_id], embeddings=[embedding]) return doc_id </code></pre> The store itself is just a local chroma_db/chroma.sqlite3</code> — no managed vector database, no extra service to stand up. That keeps the whole thing runnable on a laptop, which was the goal. Loading the knowledge base</h2> The index-documents</code> target runs a standalone script that chunks documents by token count using tiktoken</code>, embeds them through AsyncOpenAI</code>, and stores them in the agent_knowledge_base</code> Chroma collection. Chunking on tokens rather than characters matters here — it's what keeps each chunk inside the embedding model's window instead of getting silently truncated, which is the kind of bug that doesn't error, it just quietly makes your retrieval worse. What I'd carry forward</h2> Stepping back, the parts of this project I'd reuse on the next one aren't the agents — those libraries will have moved on by next quarter. It's the scaffolding: A flake target per task turns documentation into executable setup. "How do I run this?" has a literal command as the answer.</li> agenix in a shellHook</code> means a real API key is present in the environment without ever being plaintext on disk or in shell history. This is the pattern I'll copy into everything.</li> Local Chroma over SQLite is enough vector store for a single-node agent app, and skipping the managed service kept the project laptop-runnable.</li> </ul> The honest roadmap item is the same one every project like this has: containerize it and put it somewhere it can run unattended. The flake makes that a smaller leap than it would otherwise be — the build is already declarative. The code, secrets-handling, and flake are all on GitHub: parallax-labs/lab-agency</a>. — Parker Jones, parkerjones.dev</a>

Parker Jones Dev Blog - chainlit

Reproducible, Secret-Safe AI Agents with Nix Flakes, agenix, and Magentic-One