Two years ago, we were a normal-ish dev shop billing time and materials, same as everyone else we competed against. Today, our median time from kickoff to a working MVP is 38 days, down from a 120+ day industry norm — about 68% faster — and we've shipped 300+ products for clients in 21 countries on a fixed-price model. I want to write down how we actually got here, because the honest version has more failed experiments in it than the case study version does.
It wasn't "AI is cool, let's use it." It was simpler and more annoying than that: we kept losing deals to the timeline, not price. A client would like our team, like our portfolio, and then ask, "How fast can you ship a working v1?" and our honest answer — 4 to 6 months — would lose us the deal to someone promising 6 weeks, even when we both knew the 6-week promise was optimistic. We were pricing and scheduling like it was still 2019.
At the same time, billing by the hour was starting to feel dishonest in a specific way: our senior engineers were visibly faster with AI tools than they'd been a year earlier, and we were still invoicing the old number of hours for the new amount of work. Clients weren't dumb. A couple of them asked directly why a feature that "should" take a week was taking a week of billed time when the engineer was clearly using AI to write half of it.
That tension — losing deals on speed, and feeling weird about billing hours that didn't reflect what AI was actually doing — is what pushed us to rebuild the whole delivery model instead of just adding Copilot licenses and calling it a day.
Our first instinct was the lazy one: give the existing team AI coding tools, keep the same T&M structure, and let speed be a nice internal bonus we didn't pass on to clients. That lasted about a quarter. Two things broke it.
So the AI-tools-on-top-of-the-old-model experiment didn't fail because AI didn't work. It failed because we hadn't changed the contract, the QA process, or the incentive structure to match what the tools actually changed.
Once we admitted the old structure was the actual bottleneck, the harder problem showed up: if engineers are moving faster and agents are doing more of the first-pass work, who's checking it, and how do you prove that to a client who's paying a fixed price and can't see your team's hours anymore?
This is the part that took us the longest to get right, longer than the pricing model change itself. We had to build what we now call our QA pipeline almost from scratch — not a manual QA step bolted onto the end of a sprint, but a structured review layer that runs alongside the AI-assisted build, with checkpoints a human actually has to sign off on before anything ships. We open-sourced our thinking on this as the Agentic QA Pipeline once we'd actually used it on enough client projects to trust it, partly because other shops kept asking us the same question we'd been asking ourselves: how do you go fast without it being reckless?
We got this wrong at least twice before it worked — once by under-investing in it (too few checkpoints, caught issues too late) and once by over-investing in it (so many review gates that we'd recreated the slow timeline we were trying to escape, just with extra steps). The version that's working now sits somewhere in the middle, and we're still tuning it.
The real shift wasn't "we added AI." It was three things happening together:
None of these three things would have worked alone. Fixed pricing without the QA layer is how you eat losses on broken deliverables. AI tooling without fixed pricing is how you keep losing the speed argument to whoever's willing to quote optimistically. The combination is what got us to a number we could actually stand behind.
300+ products shipped, clients in 21 countries, median delivery of 38 days against an industry norm we still see quoted at 120+ days for comparable scope. We didn't get those numbers from one big redesign — they're the compounding result of probably a dozen smaller fixes to the QA pipeline, the scoping process, and how we estimate fixed-price work, made over about two years.
The thing I'd tell another founder looking at this: the AI part is the easy 20%. The contract model, the QA discipline, and the willingness to eat the cost of getting the governance wrong a couple of times before it's right — that's the other 80%, and it's the part nobody puts in the pitch deck.
Happy to answer questions on the QA pipeline specifically, or on how we structure the fixed-price scoping conversation with clients who've only ever bought T&M before — that conversation took us longer to get right than the engineering did.
Tags: #development #tech-stack #agency #artificial-intelligence #productivity #bootstrapping