Self-hosting agents went mainstream this week. here's what 8 weeks of running 5 actually cost us

by Tijo Bear

self-hosting agents went mainstream

hey everyone

a "can i run AI locally?" thread hit 1,340 points on hacker news this week, and a macOS-native agent sandbox got 753. the self-host wave isn't a 2024 hobbyist thing anymore. real founders are building real workloads on their own boxes.

we've been running 5 agents in production for 8 weeks at our boutique two-person shop. me on the front end, brandon on infra. nothing fancy. one rented box, one supervisor process, five long-running agents doing different things. wrote up the actual numbers because i'm tired of reading guides written by people who've never paid the inference bill themselves.

the setup

one $69/mo MicroVM with sudo. five agents running concurrently:

a coding agent that handles small PRs from our jira board
a research agent that keeps a notion knowledge base current
a content agent that drafts the first pass of every humanai.news article (then i rewrite half of it)
a support triage agent on our shared inbox
a SEO crawler that watches 40 competitor pages daily

before this we were paying ~$340/mo across openrouter, an automation SaaS, and a research tool. swap-in cost was about a week of brandon's time. month one was rough. months two and three were boring in a good way.

what broke

week 3, the research agent ate all the RAM. classic. context window leak in a long-running loop. brandon caught it because we had basic monitoring AI agents wired up before launch. lesson: don't run anything for more than 24 hours without basic memory and token counters on a dashboard. the SaaS world hides this from you. when you self-host, "running for a month" reveals every leak.

week 5 we tried hermes 3 70B for the coding agent. fast but kept hallucinating import paths. swapped back to a hosted claude call for that one job. mixed stack works fine. purity isn't a virtue when you're shipping.

week 6, brandon did a kernel update on the host without snapshotting first. lost 4 hours of agent state. we now snapshot every night, takes 90 seconds. if you're going to self-host, this is non-negotiable.

the honest math

dollars in vs dollars saved over 8 weeks:

infra: $69/mo box, $0 in API fees for the 3 agents that run on local models
saved: $340/mo in tools we cancelled, plus ~12 hrs/week of manual research and triage
net run rate: ~$95/mo for what was costing $340/mo plus my time

these savings don't include the 6 hours brandon spent on the OOM and the kernel snapshot story. real boutique math: wash on month 1, breakeven by week 6, gravy after that.

what i'd tell another solo founder

self-hosting AI agents is a deal once you accept it's not free. cheaper than a stack of SaaS subs, but it costs co-founder hours. if you don't have a brandon, don't do it. pay the 3x premium and stay sane.

if you do have someone who likes infra, the killer move isn't running gpt-class models locally. it's running 4 boring agents 24/7 doing things that would be too expensive on hosted APIs at always-on rates. SaaS charges per-call. self-host charges per-month. for always-on workloads the math flips fast.

i built a quick AI agent cost calculator that compares the two for a given workload. basic but it's saved me a few bad decisions.

what i still don't know

whether 5 agents on one box scales to 8 without a second box (probably not, haven't pushed yet)
whether the nightly kernel snapshot survives a real disk failure (haven't tested, scares me)
whether the 2026 hosted model price drops eat this margin in 6 months

curious what other small teams are running. if you've got a self-host story with real numbers, drop it below. learning out loud beats reading another listicle about which agent platform "won."

i'm tijo, i build Rapid Claw (managed AI agents for non-technical operators). brandon does the infra. that's the whole company.

Tijo Bear

posted to

Building in Public

on May 5, 2026

Say something nice to rapidclaw…

Post Comment

1
The kernel-snapshot anecdote earned a bookmark. The "quick host update without snapshot" mistake is one I'd repeat the moment I forgot I read this.

Two things to add from a much smaller scale (Molt is week 3, two agents, way fewer dollars on the line):
1. Log volume scales weirder than people expect. Even at toy size we hit ~12 MB/day of structured Claude responses + retries + tool traces. Free on hosted APIs. Self-hosted, shipping to loki/seq adds a small line item that pays for itself the first time you debug a silent agent stall at 2 AM.
2. The "wake on the same minute" trap. Even with 2 cron-scheduled agents firing at :00, latency spikes show up — Postgres pool, SSL handshakes. I'd guess it gets brutal at 5+. Curious if your supervisor has a jitter knob, or you're hand-staggering.
Genuine question: when hosted prices drop another 30-40% (rumored Q3), does the math still pencil at your two-person scale, or does the time-arbitrage piece flip in 6-12 months?

(Building Molt and curating agent skills on TokRepo on the side — same cost-of-running questions hit both.)
leoparkbuild

·
a day ago
·
Reply
1
What you’re describing is the first real split in agent infrastructure:

hosted models for high-variance reasoning
self-hosted agents for boring, always-on work

That’s the right architecture.

Most teams still treat “AI infra” like one pricing decision when it’s really two:
1. expensive thinking
2. cheap persistence
Hosted wins on judgment.
Self-host wins on repetition.

The teams that figure that split out early are going to have much better margins than the ones brute-forcing every recurring task through APIs.

Rapid Claw is directionally right, but the current name still sounds more like a fast automation tool than the control layer sitting above long-running agent ops.

If you keep leaning into agent orchestration / always-on operator infrastructure, Vroth.com would carry that category much harder than Rapid Claw.
aryan_sinh

·
2 days ago
·
Reply
1. 1
  
  —😅
  
  rapidclaw
  
  ·
  a day ago
  ·
  Reply