A solo founder reached out before their launch. They'd built their entire web and mobile app using Claude Code — fast, functional, clean UI at first glance. They wanted a QA audit before real users touched it.
I logged 40+ issues. 12 were critical.
Here's what broke and why it matters if you're building with AI tools:
Edge cases the AI never considered
Most critical issues were edge cases — empty field submissions, network drops mid-action, special characters in inputs. The AI built exactly what it was told to build. Nobody told it to ask "what if this goes wrong?" So it didn't.
The app either crashed silently, threw a generic unhandled error, or worse — appeared to succeed while doing nothing.
Regression after fixes
The founder went back to Claude Code to fix the reported issues. Several fixes broke adjacent features. A login flow fix broke session handling downstream. A UI fix on one screen misaligned another.
AI fixes what you tell it to fix. Precisely that, nothing more. Without someone tracking the full scope of changes, regression stacks up fast.
No consistency across error states
The same class of error — a failed network request — was handled differently across features. Modal here, inline message there, silence somewhere else. Each individually defensible. Together, an unpredictable experience that erodes user trust.
AI has no memory across prompts. Nobody was holding the whole product in their head. That's a human job.
I'm Redion, founder of QAura — I do QA audits for startups, especially ones building with AI tools. Happy to answer questions or take a look at what you're building.
Are you shipping something built with Cursor, Bolt, or Lovable? Drop it below — I'll give you honest feedback.
The edge case problem is the one I keep seeing too. I've built most of Genie 007 with AI assistance over 15 months, and the pattern is consistent: AI writes code that handles the happy path perfectly. The failure modes it misses are almost always the ones you listed. What helped me was writing test cases for each feature before asking AI to build it, then running those scenarios after. Doesn't eliminate regression but it cuts the critical issues significantly.
The regression-after-fix pattern worries me most in AI-only codebases. AI fixes exactly what you tell it to fix and nothing else, you nailed it. One thing I've noticed when I dictate prompts into Claude Code with DictaFlow instead of typing them is that speaking forces me to be concise. Typing tends to produce over-specific prompts that still miss edge cases. But when you speak the intent naturally, the prompt gets shorter and the AI has to fill in more, which is better and more dangerous depending on who's reviewing the output. Either way, the human who's holding the whole product in their head is still the job that matters most.
Using AI tools requires developers to have strong logical thinking skills, and I've encountered many pitfalls along the way.
That's the type of issues we - programmers before AI, used to fix on our own, cause we broke it on our own.
I think that's why we know where to look, what to focus on, whereas people coming into the industry and shipping products fully with AI, are not aware of the complexity of simple features like: login, session, authentication.
From their perspective - it just works. They've done it multiple times when browsing web. But to build something that doesn't break - that's a countless iteration.
No hating on vibe coders though - pure love for all - I am actually using AI more than ever now and I know how hard it is to build something solid from scratch.
solid breakdown, the regression one especially. there's a sibling to your edge cases a QA pass can miss: the security bugs that never crash or throw. the AI builds a route that returns the right data in the demo and never checks the caller owns it, so changing one id in the url hands back someone else's record. same root cause you named. the question nobody prompted was who this should refuse, not just what could break it. that login-and-session area you flagged is usually where those access bugs sit too.
Interesting findings. The AI fixes exactly what you ask it to fix point really stands out. Speed is valuable, but without QA and regression testing, it's easy to accumulate hidden issues. Curious which category of bugs showed up most often in AI-built products.
Interesting.
The thing I'd be careful with is treating this as an AI-coding problem only.
A lot of what you're describing sounds like missing product decisions showing up as bugs.
The useful question may not be what AI broke, but which decisions nobody made before the code existed.
That's a fair point and you are right that a lot of edge cases come down to decisions that were never made, not AI limitations.
The difference that I would add is that with traditional development those gaps are catch earlier with the back and forth with the developer, but with AI you can ship a complete looking product so fast that nobody ask "what happens if we change this", so missing decision stays hidden longer.
So QA audit helps to catch those before clients do so.