2
3 Comments

Self-hosting agents went mainstream this week. here's what 8 weeks of running 5 actually cost us

self-hosting agents went mainstream

hey everyone

a "can i run AI locally?" thread hit 1,340 points on hacker news this week, and a macOS-native agent sandbox got 753. the self-host wave isn't a 2024 hobbyist thing anymore. real founders are building real workloads on their own boxes.

we've been running 5 agents in production for 8 weeks at our boutique two-person shop. me on the front end, brandon on infra. nothing fancy. one rented box, one supervisor process, five long-running agents doing different things. wrote up the actual numbers because i'm tired of reading guides written by people who've never paid the inference bill themselves.

the setup

one $69/mo MicroVM with sudo. five agents running concurrently:

  • a coding agent that handles small PRs from our jira board
  • a research agent that keeps a notion knowledge base current
  • a content agent that drafts the first pass of every humanai.news article (then i rewrite half of it)
  • a support triage agent on our shared inbox
  • a SEO crawler that watches 40 competitor pages daily

before this we were paying ~$340/mo across openrouter, an automation SaaS, and a research tool. swap-in cost was about a week of brandon's time. month one was rough. months two and three were boring in a good way.

what broke

week 3, the research agent ate all the RAM. classic. context window leak in a long-running loop. brandon caught it because we had basic monitoring AI agents wired up before launch. lesson: don't run anything for more than 24 hours without basic memory and token counters on a dashboard. the SaaS world hides this from you. when you self-host, "running for a month" reveals every leak.

week 5 we tried hermes 3 70B for the coding agent. fast but kept hallucinating import paths. swapped back to a hosted claude call for that one job. mixed stack works fine. purity isn't a virtue when you're shipping.

week 6, brandon did a kernel update on the host without snapshotting first. lost 4 hours of agent state. we now snapshot every night, takes 90 seconds. if you're going to self-host, this is non-negotiable.

the honest math

dollars in vs dollars saved over 8 weeks:

  • infra: $69/mo box, $0 in API fees for the 3 agents that run on local models
  • saved: $340/mo in tools we cancelled, plus ~12 hrs/week of manual research and triage
  • net run rate: ~$95/mo for what was costing $340/mo plus my time

these savings don't include the 6 hours brandon spent on the OOM and the kernel snapshot story. real boutique math: wash on month 1, breakeven by week 6, gravy after that.

what i'd tell another solo founder

self-hosting AI agents is a deal once you accept it's not free. cheaper than a stack of SaaS subs, but it costs co-founder hours. if you don't have a brandon, don't do it. pay the 3x premium and stay sane.

if you do have someone who likes infra, the killer move isn't running gpt-class models locally. it's running 4 boring agents 24/7 doing things that would be too expensive on hosted APIs at always-on rates. SaaS charges per-call. self-host charges per-month. for always-on workloads the math flips fast.

i built a quick AI agent cost calculator that compares the two for a given workload. basic but it's saved me a few bad decisions.

what i still don't know

  • whether 5 agents on one box scales to 8 without a second box (probably not, haven't pushed yet)
  • whether the nightly kernel snapshot survives a real disk failure (haven't tested, scares me)
  • whether the 2026 hosted model price drops eat this margin in 6 months

curious what other small teams are running. if you've got a self-host story with real numbers, drop it below. learning out loud beats reading another listicle about which agent platform "won."

i'm tijo, i build Rapid Claw (managed AI agents for non-technical operators). brandon does the infra. that's the whole company.

posted to Icon for group Building in Public
Building in Public
on May 5, 2026
  1. 1

    The kernel-snapshot anecdote earned a bookmark. The "quick host update without snapshot" mistake is one I'd repeat the moment I forgot I read this.

    Two things to add from a much smaller scale (Molt is week 3, two agents, way fewer dollars on the line):

    1. Log volume scales weirder than people expect. Even at toy size we hit ~12 MB/day of structured Claude responses + retries + tool traces. Free on hosted APIs. Self-hosted, shipping to loki/seq adds a small line item that pays for itself the first time you debug a silent agent stall at 2 AM.

    2. The "wake on the same minute" trap. Even with 2 cron-scheduled agents firing at :00, latency spikes show up — Postgres pool, SSL handshakes. I'd guess it gets brutal at 5+. Curious if your supervisor has a jitter knob, or you're hand-staggering.

    Genuine question: when hosted prices drop another 30-40% (rumored Q3), does the math still pencil at your two-person scale, or does the time-arbitrage piece flip in 6-12 months?

    (Building Molt and curating agent skills on TokRepo on the side — same cost-of-running questions hit both.)

  2. 1

    What you’re describing is the first real split in agent infrastructure:

    hosted models for high-variance reasoning
    self-hosted agents for boring, always-on work

    That’s the right architecture.

    Most teams still treat “AI infra” like one pricing decision when it’s really two:

    1. expensive thinking
    2. cheap persistence

    Hosted wins on judgment.
    Self-host wins on repetition.

    The teams that figure that split out early are going to have much better margins than the ones brute-forcing every recurring task through APIs.

    Rapid Claw is directionally right, but the current name still sounds more like a fast automation tool than the control layer sitting above long-running agent ops.

    If you keep leaning into agent orchestration / always-on operator infrastructure, Vroth.com would carry that category much harder than Rapid Claw.

Trending on Indie Hackers
Agencies charge $5,000 for a 60-second product demo video. I make mine for $0. Here's the exact workflow. User Avatar 84 comments I wasted 6 months building a failed startup. Built TrendyRevenue to validate ideas in 10 seconds. User Avatar 53 comments Your files aren’t messy. They’re just stuck in the wrong system. User Avatar 28 comments Built a tool that finds which Reddit/HN threads are making ChatGPT recommend your competitors User Avatar 26 comments Why Direction Matters More Than Motivation in Exam Preparation User Avatar 14 comments I built a health platform for my family because nobody has a clue what is going on User Avatar 13 comments