"Off-Grid CTO Dispatches #2: The AI Agent Team"
Last week I wrote about the off-grid infrastructure stack.
This week is the layer on top of it: the AI team that runs on that stack.
Not “AI magic.” Not autonomous AGI nonsense. Just a practical orchestration system that helps me ship more with less context switching.
I run one orchestrator agent (Xander) plus specialist agents for engineering, infrastructure, research, writing, design, and finance. Each has a role, constraints, and a clear handoff pattern.
The key idea is simple: one agent should decide, specialists should execute.
The org chart
The structure is intentionally boring:
- Xander — orchestrator, triage, planning, and quality gate
- Willow — coding and engineering execution
- Spike — Docker, deploys, infra, and production checks
- Giles — research and analysis
- Oz — external writing and polished copy
- Cordelia — frontend/design work
- Anya — finance and billing support
This isn’t roleplay. It’s workload partitioning.
If one agent is both deciding and building everything, context gets mushy and quality drops. A specialist model with a narrow brief does better work faster, and the orchestrator keeps the system coherent.
How work actually flows
A normal feature looks like this:
- Xander reads task context and writes a tight brief
- Specialist agent executes in isolation
- Xander verifies output (tests, correctness, side effects)
- Task board gets updated with notes and next step
The verification step is non-negotiable. Agent output is not truth. It’s a draft until checked.
That one rule prevents most “looks good, shipped broken” failures.
What went wrong early
I made two classic mistakes.
1) Overlapping agents on the same area
I used to spin up multiple agents in parallel on related code paths. On paper it looked efficient. In practice it caused collisions, duplicate effort, and weirdly inconsistent outcomes.
Now the rule is: one sub-agent per feature area at a time.
Parallelize across independent tracks, not inside a single one.
2) No reliable work-in-progress tracking
LLM sessions are stateless by default. That’s fine for a single turn, terrible for long-running work.
I had tasks that were “in progress” in my head but nowhere in a durable system. Then a context switch happened and they disappeared into the void.
Fix: every non-trivial task gets a WIP entry at start and is explicitly closed at finish. If it isn’t in WIP, it doesn’t exist.
That one change reduced dropped threads more than any model upgrade.
The operating rules that made it stable
These rules did more for reliability than prompt tweaks:
- Plan first, then execute. Define goal, constraints, and done-state before any tool calls.
- Short, explicit briefs. Scope, files, acceptance criteria, and what not to touch.
- Specialists only do specialist work. No generic “everyone does everything” mode.
- Verify before announce. Builder doesn’t grade their own homework.
- Async follow-through. If a task is async, schedule a check to confirm completion.
In other words: treat agents like contractors, not oracles.
Why this beats solo mode
The biggest gain isn’t raw speed. It’s reduced context switching.
I can keep high-level focus while specialist agents run implementation, research, and ops checks in parallel lanes. That means fewer “where was I?” resets and more actual shipped work.
It also improves quality because each lane has a clear owner and explicit review.
This is the same pattern good engineering teams use with humans: clear roles, clear interfaces, clear accountability. The only difference is some teammates are language models.
The hard limit nobody wants to say out loud
Agents are only as good as the operating system around them.
If your task board is vague, they’ll be vague. If your acceptance criteria are fuzzy, results will be fuzzy. If your process has no verification, you’ll deploy fiction eventually.
The model matters. The system matters more.
What I’d do first if you want this setup
If you’re building your own agent workflow, start here:
- Define 3–5 specialist roles with non-overlapping scope
- Add mandatory WIP tracking for non-trivial work
- Require verification before any “done” update
- Use one shared task board with research notes tied to tasks
- Add a follow-up check for every async promise
Don’t start with ten agents and a giant prompt file. Start with operating discipline.
That’s what compounds.
Next week: using Home Assistant as a real monitoring layer — battery, temperature, and infrastructure telemetry — not just smart lights and novelty automations.
If you want this working in your stack — not a tutorial, actual implementation — that’s what I do. Work with me →