
hey everyone
the "can i run ai locally?" thread that hit 1,340 on hn this weekend put us on the spot. we sell managed agent hosting. so when half the comments are people saying they're moving everything to a used 3090 and a tailscale tunnel, the question i had to be honest about is: at what point does self-hosting actually win for an sme?
i spent saturday running the numbers on our own fleet instead of guessing. it's me and brandon. we run five agents in production. one drafts customer onboarding emails for a boutique consultancy. two writing-and-research agents for our own content pipeline. one schedules and posts. one is a janitor agent on a tiny supabase. that is the entire fleet. april mrr was $4,180. this is not big-co math.
here's the comparison.
what we pay now (managed)
the five agents share infrastructure that runs us about $69/mo on the builder tier plus token spend that came in at $147 last month across openrouter and venice. $216/mo all-in for inference and hosting. brandon spends maybe two hours a week on it and that's mostly because he wants to, not because something's broken.
what self-hosting would cost
a used 3090 runs $650 to $900 on ebay right now. a workstation chassis with the psu, board, ram, ssd, and cooling we'd actually trust 24/7 puts us around $2,400 built. add a small ups, cabling, and the cost of brandon's time setting it up (illustrative: 12 hours at the implicit rate we owe ourselves) and we're at roughly $3,200 to walk in the door.
electricity at our local rate ($0.17/kwh) with the gpu pulling about 250w under our actual workload duty cycle is around $30/mo. swapping to nemotron 3 super or hermes-style local models drops the openrouter bill, call it $120/mo back. recurring delta is around $90/mo in favor of self-hosting.
the crossover
$3,200 upfront divided by $90/mo savings is 36 months. that's not catastrophic. that's also not great. and the math gets meaningfully worse if you weight brandon's hours honestly. our two hours a week on managed becomes, optimistically, five hours a week in month one of self-hosted and drops to maybe three steady state once the runbook is built. one of the hidden costs of self-hosting AI agents is that the runbook is a thing you and only you maintain.
the real answer, and where i think the trend is right
for our fleet, on our actual usage, managed wins. but the part i didn't want to admit is that the self-host wave isn't wrong. there is a real category where the math flips fast:
anyone running 24/7 inference on small models. if your duty cycle is 90% instead of our ~15%, the openrouter savings are 6x and payback drops to 6 months.
privacy-bound workloads. you can't price the legal review cost of "is this data leaving the building" until you've paid it.
founders who already own the hardware. our math assumed greenfield. if you've got a 3090 sitting under your desk, upfront is zero and the only question is "do i want this to be a part of my job."
the calculator we built (open: an AI agent cost calculator that takes duty cycle and electricity rate seriously, plus a longer GPU cost framework writeup with the assumptions) keeps surprising me with how non-obvious the crossover is for sme-scale workloads. it is almost never the headline number that decides.
the dgx spark drop and nemotron 3 super made the self-host pitch loud this month. for most smes we talk to, the loud part isn't where the money is. it's the duty cycle that decides, and most sme fleets just don't run hot enough.
curious where everyone else lands. anyone self-hosting at sme scale (not enterprise, not hobby) and tracking the real ops time? would love to compare runbooks.
Asking the crossover question out loud is the honest move, since most of these threads compare a hardware bill to a SaaS bill and skip the part where you become the on-call. For a two-person team the real cost of self-hosting is the Saturday you just spent on it, plus the next one when something breaks at a bad time.