Or be a really good one, as the Replit founder suggested at a hackathon - "All software could be considered an AWS wrapper"
I've built and launched an AI content writing app in my previous company which scaled to 100s of millions of generations. I learnt some lessons the hard way and wanted to share them fwiw.
These should be considered a checklist for going to production with your AI app. I try to highlight the mistakes I made and the solutions that worked.
🔴 Mistake #1: Didn't think about output validations
This can lead to having user cohorts who never like the output because their inputs are very different from the test users.
✅ Solutions:
🔴 Mistake #2: Thought who would DDoS me?
DDoS on a server means you might experience downtime, while DDoS on an LLM system would mean downtime + a huge cost.
✅ Solutions:
🔴 Mistake #3: Didn't limit users
Similar to DDoS, some users would just consume a lot more than others and having a shared API key meant a few users having a worse experience.
✅ Solutions:
🔴 Mistake #4: Didn't care about latency
Inference API latencies can almost break the snappy user experience an app delivers.
✅ Solutions:
🔴 Mistake #5: Retro-fitted Datadog for logs & metrics
Datadog and similar tools aren't meant for large text logs OR probabilistic API outputs. Learnt this the hard way.
✅ Solutions:
🔴 Mistake #6: Didn't bother about data privacy
When you're small, you may not worry about the fines but customers care a LOT about privacy. Think of this as a leaky bucket.
✅ Solutions:
Getting prod-ready is a marathon not a sprint!
P.S. - I'm building a tool to help gen ai apps & features become prod-ready gaining from my experience (portkey.ai). Happy to give IH folks a demo.
the replit analogy is spot on. every SaaS is a wrapper around something — the value is in the workflow and the decisions you make on top of the raw API.
one thing I learned building on top of LLMs: the routing layer is where the real value lives. choosing which model handles which task, when to use the expensive one vs the cheap one, how to handle failover — that is genuine product logic, not just wrapping an API call.
the wrapper problem only applies when you are literally just proxying API calls with a UI on top. if you are making intelligent decisions about the infrastructure layer, that is a real product.
This is super interesting. This actually seems super relevant to enterprise customers. Have you thought about that?
Yes, guessing this is useful to anyone in production with LLMs. Enterprises will need a lot more.
That's why I tried adding the "Basic", "Advanced" and "Expert" categories.
Great checklist! I also faced the same learnings from mistakes 4, 5, and 6. I'm curious, where did you get your TOS and privacy policy from, and did you have these terms in place before you launched?