"Off-Grid Operator #3: Running Elixir in Production from a Cabin"

Most infrastructure advice assumes infinite power and perfect internet.

I have neither.

I run Elixir/Phoenix apps from an off-grid cabin in the Yukon. Power comes from batteries + generator. Internet comes from Starlink. One server sits in the cabin, another in town, and a VPS handles public traffic. It sounds fragile. Weirdly, it made my systems better.

Why Elixir survives this environment

Elixir is boring in all the right ways when you run infrastructure with real-world constraints:

Fault-tolerant by default — processes crash, supervisors restart, life goes on.
Low memory footprint for many concurrent tasks.
Great at long-running systems (webhooks, background jobs, stateful services).
Operational clarity — if something dies, you can usually find out why quickly.

In an off-grid setup, resilience isn’t a nice-to-have. It’s the whole game.

The practical architecture

I keep it simple:

Cabin node: Home Assistant + n8n + internal support services.
Town server: heavier app workloads (Mission Control, memory services, container management).
VPS: public websites and anything that needs open internet ingress.
Network: Tailscale mesh between all nodes.

Phoenix services can talk to each other over private Tailscale addresses. I don’t expose internal APIs to the public internet unless absolutely necessary.

Build times are the tax you pay

On modest hardware, Docker builds for Phoenix apps can take 5–8 minutes.

That’s the tax. You either complain about it, or design around it:

Fewer deploys, each with more value.
Avoid dependency churn.
Keep release discipline tight (version bump, changelog, reproducible build).
Don’t redeploy because you changed one typo in a dashboard label.

This sounds obvious until you pay the latency bill every day.

What actually breaks (and how to design for it)

Three common failure modes:

Transient network blips during deploys
Fix: retries + resumable workflows + idempotent hooks.
Background job stampedes when links recover
Fix: queue control, backoff, and explicit concurrency limits.
Power-aware scheduling
Fix: run heavy work when battery state supports it; delay non-urgent jobs.

The pattern: treat physical constraints as first-class production signals.

Off-grid changes your definition of “production-ready”

In cloud-heavy teams, “production-ready” often means passing CI and green dashboards.

In this setup, production-ready means:

It tolerates intermittent connectivity.
It retries safely without duplication.
It fails loud enough to notice.
It can recover without a human babysitting it.
It doesn’t burn battery for vanity workloads.

It’s less glamorous, more reliable, and way cheaper.

Next post: memory that doesn’t vanish between AI sessions — how I built a practical long-term memory stack for agents that forget everything every restart.

If you want this working in your stack — not a tutorial, actual implementation — that’s what I do. Work with me →