
hey everyone
a "can i run AI locally?" thread hit 1,340 points on hacker news this week, and a macOS-native agent sandbox got 753. the self-host wave isn't a 2024 hobbyist thing anymore. real founders are building real workloads on their own boxes.
we've been running 5 agents in production for 8 weeks at our boutique two-person shop. me on the front end, brandon on infra. nothing fancy. one rented box, one supervisor process, five long-running agents doing different things. wrote up the actual numbers because i'm tired of reading guides written by people who've never paid the inference bill themselves.
one $69/mo MicroVM with sudo. five agents running concurrently:
before this we were paying ~$340/mo across openrouter, an automation SaaS, and a research tool. swap-in cost was about a week of brandon's time. month one was rough. months two and three were boring in a good way.
week 3, the research agent ate all the RAM. classic. context window leak in a long-running loop. brandon caught it because we had basic monitoring AI agents wired up before launch. lesson: don't run anything for more than 24 hours without basic memory and token counters on a dashboard. the SaaS world hides this from you. when you self-host, "running for a month" reveals every leak.
week 5 we tried hermes 3 70B for the coding agent. fast but kept hallucinating import paths. swapped back to a hosted claude call for that one job. mixed stack works fine. purity isn't a virtue when you're shipping.
week 6, brandon did a kernel update on the host without snapshotting first. lost 4 hours of agent state. we now snapshot every night, takes 90 seconds. if you're going to self-host, this is non-negotiable.
dollars in vs dollars saved over 8 weeks:
these savings don't include the 6 hours brandon spent on the OOM and the kernel snapshot story. real boutique math: wash on month 1, breakeven by week 6, gravy after that.
self-hosting AI agents is a deal once you accept it's not free. cheaper than a stack of SaaS subs, but it costs co-founder hours. if you don't have a brandon, don't do it. pay the 3x premium and stay sane.
if you do have someone who likes infra, the killer move isn't running gpt-class models locally. it's running 4 boring agents 24/7 doing things that would be too expensive on hosted APIs at always-on rates. SaaS charges per-call. self-host charges per-month. for always-on workloads the math flips fast.
i built a quick AI agent cost calculator that compares the two for a given workload. basic but it's saved me a few bad decisions.
curious what other small teams are running. if you've got a self-host story with real numbers, drop it below. learning out loud beats reading another listicle about which agent platform "won."
i'm tijo, i build Rapid Claw (managed AI agents for non-technical operators). brandon does the infra. that's the whole company.
The kernel-snapshot anecdote earned a bookmark. The "quick host update without snapshot" mistake is one I'd repeat the moment I forgot I read this.
Two things to add from a much smaller scale (Molt is week 3, two agents, way fewer dollars on the line):
Log volume scales weirder than people expect. Even at toy size we hit ~12 MB/day of structured Claude responses + retries + tool traces. Free on hosted APIs. Self-hosted, shipping to loki/seq adds a small line item that pays for itself the first time you debug a silent agent stall at 2 AM.
The "wake on the same minute" trap. Even with 2 cron-scheduled agents firing at :00, latency spikes show up — Postgres pool, SSL handshakes. I'd guess it gets brutal at 5+. Curious if your supervisor has a jitter knob, or you're hand-staggering.
Genuine question: when hosted prices drop another 30-40% (rumored Q3), does the math still pencil at your two-person scale, or does the time-arbitrage piece flip in 6-12 months?
(Building Molt and curating agent skills on TokRepo on the side — same cost-of-running questions hit both.)
What you’re describing is the first real split in agent infrastructure:
hosted models for high-variance reasoning
self-hosted agents for boring, always-on work
That’s the right architecture.
Most teams still treat “AI infra” like one pricing decision when it’s really two:
Hosted wins on judgment.
Self-host wins on repetition.
The teams that figure that split out early are going to have much better margins than the ones brute-forcing every recurring task through APIs.
Rapid Claw is directionally right, but the current name still sounds more like a fast automation tool than the control layer sitting above long-running agent ops.
If you keep leaning into agent orchestration / always-on operator infrastructure, Vroth.com would carry that category much harder than Rapid Claw.
—😅