As founders, we can be bad at testing our own products. We already know the flows. We know where to click. We know what each screen means.
New users don’t.
And that’s where most products break. Not bugs — UX.
AI can help you see your product like a stranger again.
Here’s how.
Let’s start with the simplest setup. Then, we’ll look at more advanced ones later.
Open your app in a private window. Pretend you're a brand-new user. Do one thing a first-time user should do, like sign up and send an invoice.
Record your screen while you do it (Mac/Windows has this built-in).
Then, drop that video into any AI chat tool you use. Ask it to act like a confused new user and tell you:
That’s it.
You’ll spot small UX issues that cost you users, and you’ll know exactly what to fix.
Now let’s make it faster and more repeatable.
This version utilizes test scripts to automate clicks and allows AI to perform the review.
You’ll need a test tool like:
(If you already have automated tests, you’re halfway there.)
The idea:
What to change in your test
After each step, save the text from the screen.
In Playwright, that might look like this:
const body = await page.textContent('body');
fs.appendFileSync('flow-log.txt', \`\\nStep 3:\\n${body}\`);
Do that for every screen.
You’ll end up with a plain text log like:
Step 1 - Homepage
Step 2 - Signup form
Step 3 - Dashboard
Step 4 - Invoice form
Step 5 - Confirmation
Ask AI for feedback
Paste that log into ChatGPT, Claude — or any other tool of your choice — with this prompt:
Act like a new user.
Here's what the app shows, step by step.
Tell me where the flow is unclear or confusing.
Say what you would do at each step.
Now the AI reviews the flow — just like Level 1 — but with no manual clicking.
You can do this:
You’ll get fast, repeatable feedback — every time the test runs.
This is where the AI does everything:
It acts like a QA engineer — but faster and tireless.
You’ll find these in:
This needs:
But once it’s live, it’s powerful:
This is worth doing when:
How to set it up
You have 2 options:
Option 1: Use a QA platform with agents
These tools give you built-in agents:
What to do:
No code needed for most of these.
If you have dev resources:
It takes time to set up, but you get full control.
That's it.
Quick note: Don’t jump straight to agents.Only invest in them once:
If not, you may end up automating chaos.
It is very practical to turn “UX testing” into something that is not a research project. ~
I did something similar lately: I recorded my first-time flow, fed it to an AI and asked it to act like a confused first-time user. Feedback was very straightforward.
I’m not sure what this button does.
I’m uncertain about whether this worked
I'm scared to click this.
No color, no font, nothing. Only Transparency and Trust.
I enjoy the growth occurring at this site.
Press the click option on the script to automate.
It reflects how products evolve. Initially, you simply require a new perspective. Ultimately, you want repeatability. Sooner or later, you will need guardrails.
Using this before shipping changes is an underrated insight. We often see problems with UX only when our users resign or complain. Conducting this assessment on onboarding, pricing, and initial action sequences would likely identify 80% of expensive frictions.
What would make you leave? is a good question.This is a fantastic prompt. This question makes you look at the product as a stranger again.
Have you found certain flows where there are consistently more problems (signup vs onboarding vs first task)?
Level 1 approach is a game-changer for solo developers. Just recording yourself as a fresh user and asking ChatGPT 'what would confuse you here?' catches so many small friction points that you've gone blind to.
One thing I'd add: the mistake simulation point from the comments is crucial. Most people test the happy path. But the real UX breaks when users do things slightly wrong - fill fields in unexpected order, click before pages fully load, or misunderstand what a button does. When you ask AI to narrate its confusion ("Should I click X or Y?"), that's where you find the real issues.
The Playwright approach at Level 2 is solid too for repeatable testing across feature changes. The plain text log idea is simple but genius - no need for complex reporting tools, just AI reading what users see.
One question though: How well does this catch issues that appear only under specific conditions (slow network, on mobile, etc.)? Seems like something to layer in once the basic flow is smooth.
Several people in this thread hit on the same thing: AI feedback simulates confusion. It doesn't actually experience it. A real person who has never seen your app will do things no model would predict. Click the wrong button. Misread your pricing. Get stuck somewhere you never thought to test.
We built Test by Human for this exact gap. You submit your URL, tell us what flow to test, and a real person goes through your site for the first time while screen-recording with voice narration. You get back a video showing where they hesitated, what confused them, what they skipped entirely.
I'd call it Level 0 in this framework. Before you automate anything, just watch one stranger use your product. That five-minute video will tell you more than a week of staring at analytics.
First test is free if anyone wants to try it. Find us @testByHuman on X.
This is a great breakdown especially the reminder that founders rarely experience their own product like real users do.
I’ve seen this play out when analyzing funnels: conversion drops often aren’t caused by technical bugs, but by small clarity gaps unclear wording, hidden expectations, or cognitive overload during first interaction.
The Level 1 approach is surprisingly powerful because it forces perspective shifting before introducing automation complexity. I like the progression you outlined it prevents teams from over-engineering validation too early.
Curious about your experience here:
Have you noticed AI feedback aligning closely with actual user behavior metrics (drop-offs, time-to-completion, etc.), or do you treat it more as directional insight rather than validation?
Thanks for sharing this , very practical framework.
The Level 1 approach is underrated. Recording yourself and asking AI to narrate confusion catches so many blind spots. But there's a gap: AI simulates how users might behave, not how they actually behave.
We built SaasFeedback (https://saasfeedback.ai/) specifically for this — real user feedback loops that capture actual friction points during onboarding. Combining it with AI simulation gives you both perspectives: what users could struggle with and what they actually do.
Curious if you've tested combining AI simulation with real user session data to validate which issues actually cause churn?
this is super practical honestly. the level 1 approach alone would catch so many issues founders just gloss over because they're too close to the product. we built something similar internally where we record first-time user sessions and it's wild how different the experience looks when you're not the one who designed it.
do you find the ai feedback is more useful on visual stuff like layout/button placement or more on copy/messaging clarity?
Level 2 resonates with me. I've been using Playwright for a tech news aggregator I'm building, and capturing the text content at each step has been eye-opening.
One thing I learned: asking AI to evaluate "what's the first thing a confused user would try to click" often reveals navigation gaps that functional tests miss entirely.
The key insight here is that AI isn't replacing user testing — it's catching the obvious stuff faster so real user conversations can focus on deeper problems.
This is great advise. Everyone talks about UX, but usually things come ignoring just that, out of the gate. I feel things will get better in general thanks to AI models offering best practices, assuming one asks...but a founder is usually so busy with so much stuff that they might not even ask.
Also an interesting idea to ask an AI to give its opinion, though I'm not sure how much that would match a regular user. Also, this is (I think) limited to the web, at least for now. Not apps, unless one uses screenshots (a video would likely already taint the outcome). I'll definitely run a few experiments
That's great. I have a question. What AI tools do you recommend for UX testing?
In case it has lots of pages, how can you test them all? Or just one by one manually?
I know there might be some issues when you extend pages. Thanks
This is a really practical way to “see” your product with fresh eyes. I tried something similar by recording my own flow once, and it was shocking how many small confusions I had missed. Love how you break it into levels — super doable even for small teams.
I kinda agree . . . but also think it makes it all too mechanical versus intuitive . . .
Great breakdown. One thing I've learned from onboarding tests: the confusion usually isn't where you think it is.
Most founders test the "happy path" - when everything works. But new users get confused when something doesn't work, and they can't tell if it's their fault or a bug.
Level 1 tip: When you record yourself, try to intentionally make a small mistake (like filling a field wrong) and see what happens. That's where most users bounce.
The AI feedback works best when you ask it to narrate its internal confusion - not just "this is unclear" but "I'm not sure if I should click here or scroll down first."
putting to use immediately! thanks for the tips!