1
2 Comments

Building a Copilot prompt directory after getting frustrated with “vibe coding” in real projects

I use GitHub Copilot daily as a senior full-stack developer, mostly on backend-heavy and legacy codebases.

At some point, I realized the problem wasn’t Copilot itself it was how inconsistent the outputs were depending on context and instructions. For migrations, refactors, and production code, “just autocomplete” wasn’t enough.

During a recent AI hackathon, our team built an MVP in one day and won first place. The biggest difference wasn’t the model or framework, but having structured prompt and instruction patterns to guide Copilot step by step.

That experience pushed me to start building CopilotHub a small public directory where I collect:

  • prompts
  • instruction patterns
  • agents & MCPs that actually work on real codebases (not toy examples).

I’m currently posting and iterating publicly to understand:

  • which prompts are genuinely useful
  • where Copilot helps vs hurts
  • how developers actually use “vibe coding” beyond demos

Would love feedback from others building with Copilot:
What’s the most frustrating thing you’ve hit when using AI on non-trivial projects?

https://copilothub.directory

posted to Icon for group Building in Public
Building in Public
on December 24, 2025
  1. 1

    Your hackathon insight cuts right to it — the delta between 'AI that kinda works' and 'AI you can actually ship' isn't raw model capability, it's structure around the model.

    What I've found building on real legacy codebases: the inconsistency problem splits into two distinct layers. One is prompt quality (which is exactly what your directory solves — which instructions reliably produce the output you want). The other is execution consistency — even with a well-crafted prompt, the same instruction produces subtly different code across different context windows, different session states, different times of day. Both are real but they need different solutions.

    I'm curious whether your collection is already surfacing patterns around which prompt types suffer most from 'execution drift' vs. which ones are just badly written. From what I've seen, refactoring and migration prompts are the worst for drift — they seem highly sensitive to how much prior context exists in the session when they run.

    The biggest frustration I keep hitting: there's no clean way to know if a prompt 'failed' because it was a bad prompt or because execution conditions changed underneath it. A/B testing prompts feels almost meaningless when the baseline itself isn't stable.

    What are you seeing so far? Are specific prompt categories consistently misbehaving in ways that aren't just 'needs better wording'?

  2. 1

    That frustration is very real — prompts feel powerful in demos but messy once real constraints show up.

    At this stage, I’ve seen clarity come from which problem the directory actually removes first — faster setup, fewer bad prompts, or more consistent outputs across projects.

    Curious — what’s the one behavior you’re watching to decide if this is worth doubling down on?

Trending on Indie Hackers
I'm a lawyer who launched an AI contract tool on Product Hunt today — here's what building it as a non-technical founder actually felt like User Avatar 142 comments “This contract looked normal - but could cost millions” User Avatar 54 comments 👉 The most expensive contract mistakes don’t feel risky User Avatar 41 comments A simple way to keep AI automations from making bad decisions User Avatar 40 comments The indie maker's dilemma: 2 months in, 700 downloads, and I'm stuck User Avatar 40 comments I spent weeks building a food decision tool instead of something useful User Avatar 28 comments