Hey IH — sharing a problem we kept running into while using AI from multiple providers.
Using different models was easy enough at the beginning.
One workflow used OpenAI.
Another used Claude.
A few experiments used Gemini or smaller models.
Some tools had their own keys. Some scripts had separate env vars. Some calls were made through agents. Some were just quick tests that never got cleaned up.
The model calls worked.
The bill did not.
The hard part was not only “which model is cheaper?”
It was answering basic questions across providers:
Provider dashboards were useful, but they were all separate.
Each one had its own pricing, usage view, API key structure, and reporting format.
So we kept ending up with a messy spreadsheet just to understand what happened.
That is why we started building EvoLink.
The idea is simple:
one API layer for calling different AI providers,
and one usage layer for understanding where the money went.
The minimal thing we wanted to track per request:
This changed how we looked at AI spend.
Instead of asking:
“Which provider is cheaper?”
we started asking:
“Which workflow needs which model, and what did that workflow cost across all providers?”
Still early, but this has already made routing and cost discussions much clearer.
Curious how other builders are handling this.
If you use more than one AI provider, are you reconciling usage manually, relying on each provider dashboard, or already routing everything through one layer?
there is our website: https://evolink.ai/?utm_source=indiehackers&utm_medium=community_post&utm_campaign=building_in_public_cross_provider_billing_202606&utm_content=multi_provider_bill_post
This framing makes sense. In practice the retry/fallback fields are the ones I would want most, because otherwise a “model is expensive” discussion can hide the real issue: one noisy workflow or fallback loop eating the budget.
Have you found workflow-level cost more useful than project-level cost so far?
Yes, this is a problem that many people likely have, and at first glance, no one seems to have a solution yet, at least not to my knowledge... I hope the rest of your journey is a great success; it will be a sign that the tool solves the identified problem. Good luck!
Hit this exact wall on the cloud side last year, running across AWS, GCP, and Azure, each tag things differently, bills in a different timezone, and refunds show up at random. We ended up building an internal mapping layer just so finance would trust the numbers. Curious how you're handling metadata across the 4 providers. Did you normalize
Everything to one taxonomy or keep them separate with a join layer on top?
That cloud analogy is very close to how we think about this.
The direction is a shared taxonomy on top, while still preserving provider-specific fields underneath. So teams can compare usage across providers without losing the raw details needed for debugging or billing edge cases.
The tricky part is deciding which fields should be universal and which should stay provider-specific. Timezone, retries, cached tokens, and failed calls are exactly where it gets messy.
the messy spreadsheet reality is so incredibly real. the exact second you split your automation pipelines across openai, claude, and gemini, financial observability completely breaks down. you end up guessing which micro-feature or script testing session caused a sudden cost spike overnight.
unifying this into a single abstraction layer with structured metadata like project_id and outcome_status is the only way to scale without getting a heart attack from the provider bills. moving the conversation to 'what does this specific workflow cost across providers' is a brilliant framework shift. you nailed the problem statement perfectly. major props on getting this out there.
Exactly. The painful part is not just the bill itself, but losing the connection between cost and the actual workflow that created it.
project_id, feature, environment, provider, model, and outcome_status are the kinds of fields we think matter most.
Once that metadata exists, routing and cost control become much less guessy.
over-engineering the early architecture is a silent killer for solo projects. if a simple manual webhook can validate the core transaction flow today, do that instead of spinning up a heavy microservice stack.
keep the footprint tiny and focus entirely on getting your first few paying users. the tech debt only matters once you actually have a consistent data stream to support it.
The capture side looks solid. The part I'd push on is whether your computed cost ties back to each provider's actual invoice at the end of the month, because token counts times list price almost never matches what gets billed. Discounts on cached tokens, minimum charges, failed calls that still cost you, free credits, price changes partway through the month, conversion fees, they all open a gap between the sum of your tracked runs and what the provider really charged. People trust the workflow numbers right up until the real bill disagrees with the dashboard once, then they stop trusting any of it. I've spent a lot of time reconciling Stripe data and that gap is always where it gets hard. Do you reconcile against the provider invoices, or estimate from tokens and list prices? If you actually reconcile, I'd put that front and center, since it's the part most tools skip.
This is a very fair push. Token count × list price is useful for real-time visibility, but it is definitely not the same thing as invoice reconciliation.
This is actually something we’ve been struggling with for a while too. We haven’t found a really clean solution yet, especially once cached tokens, failed calls, provider-side discounts, free credits, and mid-month pricing changes get involved.
Right now, the layer we’re focused on first is workflow-level attribution: project, feature, provider, model, retries, fallback, and outcome. But I agree that the next step has to separate “estimated runtime cost” from “reconciled billing cost” much more clearly.
This is a real infrastructure pain because once a team uses more than one model provider, “AI cost” stops being a provider-dashboard problem and becomes a workflow-accounting problem.
The sharpest angle here is not just cheaper routing. It is visibility across AI usage: which workflow caused the spike, whether retries or fallbacks distorted the bill, and whether the stronger model actually improved the outcome enough to justify the cost. That feels much more valuable than another generic AI gateway.
One thing I’d pressure-test early is the product name. EvoLink is decent, but “link” makes it feel more like a connector layer. Your stronger category may be AI cost intelligence, model usage observability, and cross-provider routing control.
Exirra .com would fit that direction better because it feels more like an AI infrastructure and signal-intelligence brand, while still leaving room for routing, usage tracking, cost attribution, fallback analysis, and workflow-level AI spend visibility.