We built an AI that scaffolds on our own framework instead of raw code — was that a mistake?

by Marc

We've been building kumiko-studio — an AI layer that generates features specifically for our own SaaS framework (Kumiko), not generic code.

The idea: instead of teaching a general AI your patterns every session, you adopt an opinionated framework once, and the AI already knows all the conventions — auth, multi-tenancy, billing, notifications, all wired up consistently.

The upside is real: no "how do I structure this?" sessions, no inconsistent patterns across features, generated code drops straight into the project.

The downside: you're locked into the framework's opinions. If the framework has a bad abstraction, the AI doubles down on it. And onboarding someone who doesn't know the framework first is harder.

Curious if others have gone this route — AI prompted specifically for your own stack — or if you think general AI + good AGENTS.md is the better long-term bet.

Marc

on June 26, 2026

Say something nice to marc_kumiko123…

Post Comment

1

@aryan_sinh "strategic cost of evolving opinions" — that's the exact framing we've been wrestling with. Every time an opinion changes in Kumiko, it's not just a migration, it's a trust question for anyone who built on top of it.

Happy to dig into this properly. Reach us at [email protected] if you want to continue.

marc_kumiko123

·
2 days ago
·
Reply
1. 1
  
  Just sent it over to [email protected].
  
  Looking forward to hearing your thoughts once you've had a chance to read it.
  
  aryan_sinh
  
  ·
  2 days ago
  ·
  Reply
1

@aryan_sinh "opinion compounds" is exactly the right frame — and that's the pressure. We've had to change opinions in Kumiko a few times already (how multi-tenant DB scoping works, how sessions are structured) and the migration guide becomes load-bearing fast.

@galdayan on migrations: the generated code actually behaves better than I expected here. Because it follows patterns exactly, codemods work reliably — no hand-rolled variations that break the script. The risk is the opposite: if a bad assumption gets generated at scale, you need to catch it before it spreads. That's why type-checking and CI guards matter more, not less, in an AI-heavy workflow.

marc_kumiko123

·
2 days ago
·
Reply
1. 1
  
  That's exactly the tradeoff I found interesting.
  
  The migration burden isn't just a technical consequence—it quietly changes the strategic cost of evolving the framework's opinions over time.
  
  I think there's a broader business decision sitting underneath that, but it's probably too much to unpack properly here.
  
  Happy to continue over email if it's useful.
  
  aryan_sinh
  
  ·
  2 days ago
  ·
  Reply
1

The observability point is real and we feel it. Right now our main signal is "did it compile and pass type-check" — not the most granular feedback loop.

What we've noticed though: the framework's structure makes the routing more predictable. Boilerplate (nav items, CRUD handlers, config keys) is almost always fast + cheap. The harder parts (schema migrations, billing logic, auth flows) are predictable too — we know upfront those need more careful generation and review. So in practice the "which model for which task" becomes fairly static per task type rather than dynamic per request.

Whether that's observability or just knowing your framework well enough to predict the cost — probably both.

marc_kumiko123

·
2 days ago
·
Reply
1

I like the “opinionated framework once, AI already knows the conventions” direction. The tradeoff I would watch is not only lock-in, but observability: once the AI is generating whole features, you want to know which parts of the workflow were cheap/repeatable and which parts required a stronger model, retries, or human review.

In practice, framework-specific generation probably makes model routing more important, not less. Boilerplate scaffolding can use a cheaper/fast path, while schema changes, billing logic, auth, or migration code may deserve a deeper model and stricter traceability.

That is the angle we keep building around with Tokens Forge: AI-powered products need access to multiple models, but also a ledger that explains which workflow, route, model, fallback, and balance bucket paid for each run. Otherwise it is hard to tell whether the framework is compounding efficiency or just hiding API spend behind faster generation.

tokensforge

·
3 days ago
·
Reply
1

Both points land. On "do opinions compound" — yes, that's exactly the bet. And the evolution concern is real, but in practice the AI reads the actual codebase each time, so when we refactor a pattern it just picks it up. No AGENTS.md drift.

On the small-team tradeoff: the one-time onboarding friction is front-loaded now because the framework is documented. After that, new features aren't "teach the AI your patterns" — they're just "describe what you need." The framework handles the architectural decisions.

marc_kumiko123

·
4 days ago
·
Reply
1

I don't think the question is whether it's more opinionated—it's whether the opinion compounds. If the framework consistently removes repetitive architectural decisions, that seems like a feature rather than a limitation. The harder part is making sure those opinions can evolve as the framework matures.

aryan_sinh

·
4 days ago
·
Reply
1

This is exactly the right tradeoff for a small team shipping consistently. The consistency win you're describing - no "how do I structure this?" debates, code dropping in ready to wire - that's the compounding advantage. The onboarding cost is real but it's a one-time friction vs ongoing velocity tax. I'm curious: as you evolve Kumiko's opinions, how do you handle migrations? Does the AI-generated code become harder to refactor if it makes assumptions that get deprecated?

galdayan

·
4 days ago
·
Reply