Your AI product was working fine yesterday. Today, users are complaining it’s spitting out garbage.
The model didn’t suddenly get dumber. Something in your data, pipeline, or setup broke — quietly.
Here’s how you can figure out what went wrong with your AI product and fix it fast.
Step 1. Reproduce the problem first
Before changing anything, confirm exactly what’s broken.
What to do:
- Get 3–5 cases where users reported wrong answers. Note the exact input and the expected output.
- Run those inputs through your app manually and see what results you get.
- Compare your app’s outputs with what you expected.
- Keep a short “Broken Cases” list in Google Sheets or Docs. You’ll reuse these examples later to test your fixes.
Step 2. Check your inputs before blaming the model
Most AI “bugs” come from bad inputs, not the model itself.
What to check:
- Are any fields missing? (e.g., empty user IDs, blank text fields)
- Are timestamps wrong or in the wrong format?
- Are you feeding the model the right language or labels?
- If you’re using embeddings:
- Are you actually fetching the right chunks from your vector database?
- Is your similarity search threshold set correctly?
How to spot bad inputs quickly:
Automation tip:
Step 3. Test your model API calls directly
Sometimes your app is fine — it’s the integration or wrapper that’s wrong.
What to do:
- Take one failing example from Step 1.
- Call the model API directly using Postman or curl.
- Compare:
- The raw API response
- What your app showed the user
How to interpret results:
- If the raw API response looks good → your bug is in your code.
- If the raw API response is wrong → the issue is upstream (model settings, embeddings, or data).
Step 4. Trace dependencies one by one
AI apps rely on many moving pieces: vector DBs, APIs, caches, spreadsheets, etc. One broken link breaks everything.
What to do:
Work backward from the final output:
User Output → Vector DB → Storage → API → Model → Input Source
At each step:
- Send a test request.
- Log the raw response.
- Compare expected vs. actual.
For example:
- If vector DB results are empty → your embeddings may not be updating.
- If an external API returns a 429 (rate limit) → throttle requests or retry.
- If cache returns old data → clear cache and retest.
Automation tip:
This turns debugging from a panic into a morning checklist.
Step 5. Build a simple debug dashboard
This step helps you spot problems early, before users start complaining.
We’ll keep it simple and use Google Sheets.
How to build it step by step:
1. Create a new Google Sheet
Open Google Sheets and create a blank sheet.
2. Add four columns
Name them:
- Input: the data or prompt you send to the model
- Output: what the model returns
- API Response Time: how long the request took (in seconds)
- Error Flag: shows if something failed (e.g., “true” or “false”)
3. Use Zapier to send data automatically
Instead of filling the sheet manually, you can have Zapier do it:
- Go to Zapier and create a new Zap.
- Set your app or API as the trigger (e.g. “New API Call” or “New Log Entry”).
- Add an action → choose “Add Row to Google Sheets.”
- Pick the sheet you just created.
- Map the fields: input, output, API response time, and error flag.
- Test the Zap and turn it on.
From now on, every time your app processes a request, Zapier will automatically add a new row to the sheet.
4. Set up conditional formatting
This makes problems stand out visually:
-
For API Response Time
- Select the entire “API Response Time” column.
- Go to Format → Conditional formatting.
- Under “Format cells if…”, choose Greater than.
- Enter 2 (meaning anything slower than 2 seconds).
- Pick a red highlight.
-
For Output
- Select the “Output” column.
- Go to Format → Conditional formatting.
- Under “Format cells if…”, choose Is empty.
- Pick an orange highlight.
-
For Error Flag
- Select the “Error Flag” column.
- Go to Format → Conditional formatting.
- Under “Format cells if…”, choose Text is exactly.
- Type true or error.
- Make it bold red.
5. Review the dashboard daily
Spend 5 minutes each morning checking the sheet:
- If you see a red cell in the “API Response Time” column → your app is slowing down.
- If you see orange cells in the “Output” column → some requests returned nothing.
- If the Error Flag shows red → something failed, and you know exactly where to start debugging.
Optional upgrade
If your app is bigger or has lots of requests, switch to a proper monitoring tool later, like:
- Metabase (for easy database dashboards)
- Grafana (for live monitoring and alerts)
But start simple with Google Sheets first.
Good post — really helpful breakdown. I’m still pretty new to building AI stuff, and I’ve already run into a few of the bugs you mentioned. The one that hit home for me was data drift — I didn’t even realize at first that a small preprocessing change could throw everything off later.
I’ve also learned the hard way how important versioning and logging are. Just having visibility into what changed between runs makes a huge difference when something suddenly stops working.
Anyway, I appreciate how you framed it as debugging the whole pipeline, not just blaming the model. That clicked for me. Thanks for writing this.
Great breakdown, Aytekin! Your step-by-step approach to debugging AI issues is super practical—love the focus on checking inputs first, as I’ve seen bad data mess up my MCP tool’s outputs too. Thanks for sharing such a clear and actionable guide!
Good article.
If you are founder of AI SaaS, we can be partner to grow together.
https://www.indiehackers.com/post/get-your-brand-featured-on-faceseek-online-kqZoXUfhqOYelHLvRoWj
Thanks for this! This is actually a perfect reminder for me as well.
Excellent breakdown, Aytekin 👏
Most people blame the “model” when issues pop up, but you’re absolutely right — the real culprits are often data pipelines and integrations.
I especially liked your tip about using Google Sheets + Zapier for lightweight monitoring — practical, low-friction, and scalable.
This post is a great reminder that debugging AI systems needs the same discipline as traditional software — just with more moving parts. Thanks for sharing this! 🚀
Really solid breakdown, this is exactly the kind of checklist every AI team needs. Debugging AI issues can be so tricky since it’s rarely “the model’s fault” but something hidden in the pipeline or data flow.
We ran into something similar while refining Faceseek, our AI that traces digital identities and connections. One small change in how data was fetched completely shifted output accuracy your point about checking inputs first really hits home. Appreciate how clearly you outlined this process! 👏