If your product uses AI, every click increases your costs.
Here's how to turn LLM pricing into plans, credits, and limits that won’t come back to bite you.
If you start from “tokens”, your brain melts. Start from what your user actually does.
Write down the main things in your product that use AI. For example:
Each of these is a single AI action. You’ll want to break every workflow or feature into single actions.
In your product, you will usually have two kinds of models:
Give them simple labels in your system:
These labels are only for you and your team. Users don’t choose them.
From here on, we’ll talk about Standard vs Deep, not specific model names. That way this works no matter which AI provider you use.
Use the same action list from step 1.
Now you want to know: “How much does one of these actions cost me, roughly?”
You can’t get real prices, but you can measure size.
For each action:
Use this:
Examples:
Now you know which actions are light and which ones are heavy.
If your product is live
Now you can get real numbers.
Cost per action ≈ total cost for that action ÷ number of runs
Do this for every action on your list.
At the end of this step you have:
You’ll use this in the next step to get cost per user per month.
You now know the cost of one run of each action.
Now ask: “For a normal paying user, how many times does each action run in a month?”
If you have users: use analytics to see how often each action is used per user.
If you don’t: make a simple and honest estimate for each action.
For each action: Cost per user for this action = (uses per month) × (cost per action)
Add all actions together: AI cost per active user per month.
You will use this number to:
AI is not your only cost.
You also pay for things like:
If you have users:
That gives other cost per user.
If you are pre-launch: Make a simple estimate, and update it later when you have real data.
Then: Total cost per user = AI cost per user + other cost per user
Later, you’ll set your prices so they are well above this total cost.
Remember: As you grow, this number can change (for example, servers can get cheaper per user).
So check it again from time to time.
You can charge in many ways:
Here we’ll assume you use credits, either alone or on top of a subscription.
Use your cost per action from step 3.
Pick simple numbers like 1, 2, 5, 10 and assign them to actions:
More expensive actions should use more credits.
From steps 4 and 5, you know:
Decide how much AI cost you’re happy to include in this plan (for example, “about $5 of AI per user”).
Then:
In the UI, you can show something simple like: “You’ve used 700 / 1,500 AI credits this month.”
You can also save a lot by changing how you call the model.
Here are a few simple tricks that don’t hurt UX:
Use the cheap model most of the time. Use it for normal chat, small edits, and short summaries. Most people won’t see the difference. They only care that it’s fast and clear.
Keep context small. Don’t send the whole chat every time. Make a short summary of old messages and send that instead. For documents, ask the user what they care about, and only send those parts.
Keep answers short. In your prompt, say the AI something like: “Keep the answer under 200 words.” “Give 3 bullet points only.”
Cache old answers. If people ask the same question many times, save the answer. Next time the question is the same, show the saved answer instead of calling the AI again.
These simple steps can cut your AI bill a lot.
This is really practical. The "think in AI actions" framing is exactly right. I run an AI calorie tracking app (Healthien) where users snap photos of food and get nutritional breakdowns. Early on I was trying to estimate costs per token and it was impossible to plan around. Once I reframed it as "one photo analysis = one AI action" everything clicked. I ended up with a free tier (limited scans per day) and subscriptions for unlimited use. The hardest part was figuring out the free tier ceiling. Too generous and you bleed money, too stingy and nobody converts. Still tweaking it honestly.
Solid framework. The action-based thinking is spot on - I build small business finance tools and had to work through this exact problem when adding AI categorization to a CSV processing workflow.
One nuance worth adding: for tools where the AI gets smarter per user over time (like learning their custom categories or preferences), your cost per user actually decreases the longer they stick around. First month might cost you 3x what month six costs because the system has learned their patterns and needs fewer expensive calls. That changes the math on how aggressively you can price early tiers to drive adoption.
The credit system approach works well for B2B tools but I have found that for prosumer or solo founder tools, simplicity wins over precision. Most solo founders would rather pay a flat $15/mo with generous limits than think about credits at all. Credits add cognitive overhead that can hurt conversion even if the economics are better for you.
Also agree hard on the cheap vs expensive model split. For something like transaction categorization, a smaller model with good few-shot examples outperforms a larger model with generic prompting almost every time. The expensive model is only worth it for ambiguous edge cases.
The "think in AI actions, not tokens" reframe is probably the most useful thing here. I spent way too long trying to estimate token costs per feature before realizing users don't think in tokens at all. They think in "I clicked the button and it did the thing." Pricing should match that mental model.
One thing I'd add to the caching point — semantic caching is a game changer if you haven't tried it. Instead of exact-match caching, you embed the query and check if there's a cached response within a similarity threshold. For something like a support chatbot where 40% of questions are slight variations of the same 20 questions, this alone cut our API costs by about a third.
The cheap vs expensive model split is something I wish I'd done from the start. I was running everything through GPT-4 class models for months before I actually benchmarked and realized that for 70% of my use cases, a smaller model produced identical user satisfaction scores. The remaining 30% where it mattered were the complex reasoning tasks — and those are exactly where users expect to wait a beat longer anyway.
Also worth mentioning: prompt caching on Anthropic and the batch API on OpenAI. If you have non-real-time workloads (nightly report generation, background analysis), the batch API gives you 50% off. That's not a small optimization.
ai pricing is genuinely the thing that keeps me up at night. running two ai-powered apps (astrologica for personalized horoscope podcasts, speakeasy for article-to-audio conversion) and the cost per user varies wildly depending on usage
the trick ive found is capping the expensive operations. astrologica generates one podcast per day per user - thats predictable. speakeasy lets you convert a few articles per month on the free tier. that way i can actually model my unit economics without getting destroyed by one power user converting 50 articles a day
the biggest mistake i made early on was pricing based on what competitors charge instead of what it actually costs me to serve each user. speechify charges 140/yr but they have massive scale advantages. as a solo dev i had to price differently
subscription with usage limits > pure usage-based pricing for consumer apps imo. users hate surprise bills