We Made AI Spam Classification Free on Every Plan -- the $0.0002 per Call That Made It Possible

Last week we shipped sales-email auto-classification on FORMLOVA, the chat-first form service we've been building.

Here is the part that is probably interesting to this crowd: we made it free on every plan, including the free tier. No upgrade prompt. No quota. The LLM cost sits entirely on us.

I want to walk through how we arrived at that call, because it was not a "we're generous" decision. It was a cost-math decision that flipped a common SaaS instinct.

The feature in one paragraph

When a form response arrives, we classify its content with an LLM into one of three labels: legitimate, sales, or suspicious. The label is shown in the dashboard and can be used to exclude sales emails from analytics, filter workflows, or auto-reply only to certain buckets. The operator can correct a label by hand, and that correction is never overwritten by future auto-runs.

As far as we can find, no form service does this today. CAPTCHA stops bots, but no one tries to classify the content of the human-written messages that get through.

The instinct most SaaS teams have

When you add an AI feature, the default pricing instinct is:

Ship it as a Pro-plan exclusive
Cap usage on lower tiers
Charge overage

This is sensible. LLM calls cost real money. Pricing controls cost. Done.

We almost did the same. We had a plan: "AI spam classification is a Standard-and-above feature. Free users get one manual filter column." Two days of pricing sketches later we scrapped it. The math made a different decision for us.

The math, with real numbers

Our classifier uses Claude Haiku 4.5 via OpenRouter. It runs asynchronously after form submission.

Input tokens per classification: ~500 (form title, description, field labels, response body, masked domain)
Output tokens: ~50 (JSON with label, score, reason)
Input price: $0.80 per million tokens
Output price: $4 per million tokens

Per classification: roughly $0.0002, which is about 0.03 Japanese yen.

So a free-tier user hitting our monthly response cap (100 responses) costs us $0.02 per user per month in classification.

A Standard-plan user with a 1,000-response monthly cap costs us $0.20.

A Premium user with 10,000 responses costs $2.

Those numbers do not bend the business. They are not even a rounding error on infrastructure spend.

What we would have to build if we gated it

Here's the hidden cost of gating a $0.02-per-user feature:

Quota enforcement code: check on every submission, add to the response pipeline, write tests for the edge cases (e.g. what happens when the user upgrades mid-month, what happens when quota resets at day boundary across timezones)
UI for quota state: warning at 80%, soft-block at 100%, the "buy more" CTA, the "why am I blocked" explanation
Billing integration: stripe webhooks for the overage SKU, idempotent consumption tracking
Support burden: "why isn't my form classifying anymore?" tickets
Upgrade friction: a free user who hits the cap and is unsure whether to pay $5/month to unlock a feature they barely understand the value of yet

All of this to protect $0.02 of LLM cost per user per month.

Building the gating infrastructure was going to cost us more engineering time, and cost users more confusion, than just absorbing the bill.

So we absorbed the bill.

But there's a second reason, and it's actually the bigger one

The cost math made the decision easy. The positioning made the decision obvious.

For anyone running paid ads -- our target users -- filtering sales emails out of your inquiry pipeline isn't a premium. It is the baseline for calculating CVR correctly. If your form is delivering 10 responses and 8 are sales pitches, your ad CVR numbers are lying to you by 4x. That's not a luxury problem. That's a did the campaign work problem.

If we charged for this, we'd be saying: "We'll tell you the truth about your ad performance, but only if you pay extra for the truth." That framing was indefensible. So we put it on every plan.

In other words: the feature's job is to make the rest of the product's numbers trustworthy. Gating it would undermine the trust the rest of the product is trying to build.

What this lets us say in the positioning

We now have a positioning line we couldn't use before: "FORMLOVA ships sales-email detection free on every plan -- the only form service that does." That is a short, specific, defensible line. It is not competing on feature count or polish. It is competing on what is included by default when you sign up.

Every "default included" is a line you can draw against competitors. Free tiers are not only acquisition tools -- they are positioning statements.

The cost model only works for async batch-size features

I should be honest about the shape of this. Not every AI feature can be free on every plan. The three things that made this one qualify:

The unit of work is small: one form response = one 500-token prompt. Not a 10,000-token document analysis.
Batched over time, not spiky: responses arrive at single-digit per-minute rates at worst. No sudden 1M-row batch that costs $200.
Fixed-cost ceiling: each plan has a monthly response cap. That naturally caps classifications per user.

If we were generating form copy from scratch (long prompts, long outputs, no natural cap), the free-on-every-plan approach would not survive the math.

Operational safeguards we put in anyway

Even with the cost math in our favor, we didn't go in without cost controls:

max_tokens: 256 on the output. The JSON is small. We don't let the model run long.
temperature: 0. Deterministic. Keeps prompt cache efficient.
Input text capped at 2000 characters per response. Bounds the worst-case prompt size.
10-second timeout, 1 retry on 429/500-series only. No runaway retry loops.
Silent failure: if classification fails, the response is saved with null labels. The form submission itself never breaks.
Per-form on/off toggle. Operators who don't need it don't pay us (literally) to run it.

That last one matters. Paid-event forms (Stripe Connect) skip classification entirely -- people very rarely spend money just to send you a sales pitch. That's a free efficiency gain.

What this costs us if we're wrong

Worst case: a wave of spam bots figures out how to send content-triggering classification loops to our endpoints. Our unit cost climbs, we burn a couple hundred dollars, we add the quota we didn't build initially. Cost-bounded, reversible.

Best case: we get to say "we give you trustworthy pipeline numbers by default, on every plan," and that line does a lot of quiet positioning work.

I think best case and worst case differ by about two orders of magnitude. The expected value is strongly positive.

Takeaway

When you're adding an AI feature, run the math before you build the gating. Calculate cost per user per month, not cost per call. Compare it to the engineering time and user friction the gating will cost. If it's a small async feature where the unit of work is bounded, you might be looking at a feature that should be on by default.

Features gated behind Pro plans are a position. Features free on every plan are also a position. Pick the one that fits what you're trying to be.

For us, AI spam classification being free on every plan says: we want your CVR numbers to be real. That is the position we wanted.

Related posts:

Release announcement on our blog: AI Now Detects Sales Emails in Your Forms -- Free on All Plans
The technical deep-dive (Next.js after() + OpenRouter): coming soon on DEV
The design philosophy behind "when in doubt, legitimate": Why We Built Sales Email Detection

Free to start at formlova.com. Connect via MCP from Claude, ChatGPT, or any MCP client.

Contact Form Operations Guide

Say something nice to lovanaut…

1
The useful distinction here is not AI feature vs non-AI feature. It is bounded cost feature vs unbounded cost feature.

Your classifier is a good free-tier candidate because the cost driver is predictable: one response in, one async classification out, small context, small output, clear monthly response cap. The hidden engineering/support cost of gating is larger than the LLM cost.

Where I would not copy the same decision:
- conversational agents where one user can trigger 20 turns
- document Q&A where context size varies wildly
- research agents with tool calls and retries
- workflows where failed runs still burn tokens
- anything where the user can upload arbitrary files
For those, I would still make the user-facing limit simple, but internally track cost per completed useful workflow. Tokens tell you where the leak is; completed outcomes tell you whether the plan is profitable.

The quick test I like: take the top 20 most expensive runs in the last month and label each as useful outcome, failed/retried, or support rescue. If almost all are useful and bounded, absorb it. If failures/retries dominate, add caps before making it free.
dibolo

·
2 months ago
·
1

Love this level of transparency! The math ($0.0002 per call) really puts things into perspective. For AI-RPA projects like mine, managing token burn for autonomous agents is a huge concern for users. Making features like this 'default free' is a killer positioning move. Are you using any specific caching strategy to keep those Claude Haiku costs so low?

aivane

·
3 months ago
·
1. 1
  Thanks — glad the numbers resonated. AI-RPA is exactly the kind of use case where per-call cost visibility matters, because the blast radius of a silent token burn is huge once agents run unattended.
  
  Honest answer on Haiku costs: most of the saving isn't from caching, it's from scope. We only classify one thing (is this submission sales spam or legit?), so prompts are short, outputs are a single label + score, no conversation history, no tool use, no retries. Single-shot, stateless, tiny context window — that alone gets you most of the way to $0.0002.
  
  On top of that:
  
  We skip classification entirely on paid forms (Stripe Connect), because sales spam in a paid signup is vanishingly rare
  
  Classification is async and fire-and-forget — if it fails, we don't retry and degrade gracefully (the response is still accepted)
  
  Only forms with text-input fields run it; pure multiple-choice forms skip
  
  We're not using Anthropic prompt caching yet — the prompts are short enough that the break-even point isn't there. If we ever move to longer system prompts with few-shot examples, that'll change.
  
  Curious how you're handling token budgeting on the RPA side — hard caps per agent, or more of a soft budget with alerts?
  lovanaut
  
  ·
  3 months ago
  ·