Plenty of big companies are moving fast with AI.
But they often don’t ship the strongest versions of certain AI products — not because they’re slow, but because they’re constrained.
They optimize for:
"One product for millions of users"
Protecting their brand
Legal/regulatory exposure
Enterprise procurement and security reviews
Internal alignment and incentives
This creates gaps.
You can build something small and specific that works well for a single task.
Here’s how to find the gaps.
Start with a real job title.
Pick:
For example: an SDR inside HubSpot, working on outbound prospecting.
This gives you a clear place to start looking for gaps.
AI works best when the work is:
If none of these are true, the “gap” usually won’t be strong enough.
Pick one tool your user already uses.
A general AI can do a lot one time if you paste everything in.
The gap is when it needs to work every week, for a team, inside the real tool, with little cleanup.
Now do this:
A) Write down the real deliverable
Ask: What do they hand to someone else when the work is done?
If you can’t name the deliverable, you don’t have a product idea yet.
B) Check three places in the tool
Ask: “Can this tool make the deliverable inside the tool, with almost no cleanup, every time?”If not, you’ve found a gap. Now, name why.
C) Name the reason
Most general tools fail because of one of the following:
Pick the biggest one.
D) Write 3–5 lines like this:
“Tool does X, but fails at Y. I’ll build Y for \[role\] in \[tool\].”
E) Score each idea. Add 1 point if:
Pick the one with 4–5 points.
Do this (in order):
Stop when 2–3 people say, “Yes, I’ll try it,” and share real inputs. If one of them agrees to a small paid pilot, even better!
The deliverable test is underrated. We killed half our roadmap once we asked what gets handed off at the end. Biggest gap we found was workflow breaks — AI does each step fine alone but cant chain them inside tools without manual glue.
The failure mode you listed first, "rules: it doesn't follow them the same way each time", is almost always a prompt structure problem. When constraints are buried inside the objective and context as one block of prose, the model treats them as soft preferences and weighs them differently run to run.
Separating constraints into a dedicated typed block changes that. The model parses them independently. Rules stop drifting across sessions.
I've been building flompt for exactly this, a visual prompt builder that decomposes prompts into 12 semantic blocks and compiles to Claude-optimized XML. Open-source: github.com/Nyrok/flompt
starting small is the way, 100% agree