How we built a fixed-price engineering model that actually scales, and why we stopped billing hourly 3 years ago

I want to tell you the honest version of this story, not the polished one.

The polished version goes: we saw the future of AI-native delivery, designed the perfect model, and executed flawlessly. That's not what happened. What happened is that we kept running into the same problem with hourly billing, got frustrated enough to restructure everything, broke several things in the process, and eventually arrived at something that works.

Here's the actual story.

Why hourly billing kept creating the wrong dynamics

When you bill hourly, your revenue is hours multiplied by the rate. That sounds neutral. It isn't.

It means that when a project takes longer, you make more money. It means that when AI tooling cuts implementation time by 40%, you face a choice between charging less (which punishes you for being efficient) or charging the same (which quietly extracts the efficiency gain from the client). Neither option is honest. Neither creates a partnership.

We were not maliciously exploiting this dynamic. But the structure was there, and it shaped how we behaved in ways we didn't always notice. Scope ambiguity got resolved in development rather than in discovery, because discovery time was billable and ambiguity created more billable development work downstream. Not intentionally, structurally.

The moment we started integrating AI seriously into our workflows, in late 2023, the dishonesty became impossible to ignore. We were compressing 40-hour implementation tasks into 20-hour ones. We could bill 40 hours and pocket the difference, or bill 20 hours and take a revenue hit. Neither felt right.

So we decided to stop billing hours entirely.

What fixed price actually required us to build

The decision was easy. The execution took about eight months of painful iteration.

First thing we had to build: a real scope governance system.

Fixed price without a precise scope is not a business model. It's a recipe for delivering something adjacent to what the client wanted and calling it done. We'd seen that failure mode from the client side; it's why "fixed-price" had a bad reputation among sophisticated buyers.

So we built a discovery sprint process that produces a single governing document before any development starts: a line-item scope agreement with explicit inclusions, explicit exclusions, and a formal change order process for anything outside it. Every ambiguity in that document is a future dispute, so we invested heavily in making them unambiguous.

Writing that document well, precisely enough that no reasonable person could interpret it two ways, turned out to be the hardest skill to develop across the team. It required the kind of systems thinking that doesn't come naturally to engineers who are used to discovering product decisions through implementation.

We got better at it through iteration. The first scope document that held up perfectly under a full engagement without a single disputed inclusion was genuinely celebratory. That took about six months.

Second thing: the financial model had to change completely.

Under fixed-price, your margin is price minus delivery cost. Every internal efficiency gain goes directly to margin. Every AI workflow improvement, every governance investment, every process refinement, all show up in the margin rather than being billed away.

This created a completely different incentive structure. We now profit from shipping faster. That means every dollar we invest in improving our delivery workflow pays back in margin on the next engagement. The incentive to avoid AI adoption, which existed subtly under T&M because efficiency meant billing less, completely disappeared.

This is the alignment that makes the model honest. Not as a stated value, but as a structural reality.

Third: AI Velocity Pods, the delivery model requires a fixed-price commitment.

We couldn't make fixed-price commitments at scale without a repeatable delivery model. What we built is what we now call an AI Velocity Pod: a small cross-functional unit of 3–6 humans governing a structured layer of specialised AI agents.

The Pod Lead defines intent and gate criteria, not code. The Review Engineer validates agent output at milestones. The QA Orchestrator manages the continuous test pipeline. The Domain Expert carries the contextual knowledge AI can't infer.

The agent layer handles first-draft implementation, test generation, documentation, and first-pass code review. Under human governance, with review gates at defined milestones, and QA designed specifically for AI-generated defect patterns.

The 38-day median delivery number comes from this model. Not from AI writing faster, from a workflow that eliminates the coordination overhead and sequential bottlenecks that burn 35–40% of traditional team capacity.

What broke along the way

Scope agreements that weren't precise enough. The first several fixed-price engagements had scope documents we thought were detailed but that contained ambiguities that only surfaced mid-sprint. Each one became a difficult conversation under deadline pressure. We now have a scope review checklist — a second-pass audit looking for unstated assumptions, ambiguous inclusions, and undefined edge cases. It's mandatory on every engagement.

Deploying AI agents before agentic QA. We stood up the agentic development pipeline before we had the QA infrastructure to govern it. We shipped faster and caught defects later. AI-generated code fails at the edges in patterns that traditional QA doesn't catch, edge-case blindness, and context collapse across module boundaries. We rebuilt the sequencing: QA first, always.

Absorbing scope additions informally. Without a formal change order process, small client requests would get absorbed mid-sprint, reasonable things, small things, that collectively eroded the fixed-price model from within. We introduced written change orders for anything outside the scope of the agreement. It felt adversarial the first time we used it. The client understood immediately once we explained that the alternative was T&M drift. Now it's just how we work.

Internal pressure to offer hourly fallbacks. Every sales conversation in which a client resisted a fixed price created internal pressure to offer a T&M option. We held the line because every exception requires an exception to scope governance, which breaks the model. Some of those deals closed with hourly competitors. Some of them came back to us six months later, over budget and underdelivered.

The outcome data after three years

Median delivery: 38 days across 300+ products
21 countries across diverse sectors and regulatory environments
Change order rate: under 15% — the metric that tells us scope governance is actually working
Cost to clients: 70–85% lower than traditional agency engagements on comparable scope

The number I'm proudest of is the change order rate, not the delivery time. Low change orders mean we're doing the hard work upstream — catching everything that needs to be defined before development starts. That's the discipline the whole model depends on.

The open question I'm still sitting with

Three years in, the model works. But one friction point remains.

Enterprise procurement systems are built for hourly billing. Vendor evaluation forms ask for day rates. MSA templates assume T&M. Getting a fixed-price, outcome-based contract through a procurement process designed for hourly billing requires either changing the buyer's internal process or translating our model into terms their system can accommodate.

We've gotten better at the latter — framing fixed-price as a milestone payment schedule with defined deliverables at each gate, which most procurement systems can handle. But it adds friction that shouldn't exist.

For anyone else running outcome-based models: how are you handling buyers whose procurement systems are structurally built for T&M?

If the model here is relevant to your thinking:

Tags: #buildinpublic #founders #ai #pricing #engineering #agencygrowth #productdevelopment

Say something nice to sunilkumarr…

1

This is a strong breakdown because the real shift is not “AI makes delivery faster.” It is that fixed-price changes the incentive structure completely.

The part that feels most valuable is your scope governance layer. A lot of AI-native delivery shops talk about speed, but speed without precise scope, QA gates, and change-order discipline just creates faster chaos. Your model sounds more credible because you are showing the operating system behind the 38-day delivery claim, not just the output.

One thing I’d pressure-test is the brand/naming layer around “AI Velocity Pods.” The model is strong, but that phrase may still sound like an internal delivery method rather than a buyer-facing category. Enterprise buyers need to quickly understand that this is outcome-based product delivery with AI leverage, human governance, and fixed-price accountability.

If you want this to become a serious category, a broader brand like Beryxa .com could support that better than naming everything around pods or hourly-vs-fixed pricing. It feels more like an enterprise delivery/intelligence company, while still giving you room to package the model around scope governance, AI delivery systems, and procurement-friendly outcomes.

The core idea is already strong. I’d just make sure the naming makes the model feel like a durable category, not just a smarter agency process.

aryan_sinh

·
a month ago
·