Hey IH! I'm Bikash, and over the last 4 months (started late October 2025) I built Cuneiform Chat — an AI agent platform that lets businesses deploy knowledge-base chatbots across Telegram, WhatsApp, Discord, Slack, web widgets, and more. It's live in production now.
I want to share the real story — architecture, mistakes, and what it actually looks like to build enterprise software with an AI coding partner.
The system is 9 repos — 6 backend services, 2 frontends, and a shared SDK:
Backend services:
Frontends:
Shared:
6 MongoDB databases, Redis isolated by DB number per service, Pinecone for vectors with per-tenant namespace isolation, S3 for document storage.
I made it multi-tenant from day one. Every query filters by organization. Every S3 path, every Pinecone namespace, every Redis key — all scoped to the tenant.
Cost: every feature takes longer. Every test has to verify isolation. Every AI-generated code change needs to understand the pattern.
Value: any organization's data is physically impossible to access from another org's context. For B2B, this is table stakes. Building it in from the start was far cheaper than retrofitting.
Claude Code is my primary development partner. Not autocomplete — a collaborator that reads my codebase, understands the architecture, and writes production code.
I maintain ~30 reference docs in a .claude/ directory — architecture decisions, service patterns, API conventions, feature specs.
A solo developer with an AI coding partner can maintain a 6-microservice architecture that would normally need a team of 5-8. The tradeoff is heavy investment in documentation — not for humans, but for your AI to maintain context across sessions.
Not marketing from day one. I spent months building in silence when I should have been talking to customers and creating content from the start. The product was production-ready long before anyone knew it existed. If you're a builder, the instinct is to keep building. Fight that instinct. The best time to start marketing was the day I wrote the first line of code.
Skipping integration tests early. Unit tests caught logic bugs. But the production bugs were integration bugs — wrong field names between services, mismatched Redis keys, routes that worked in isolation but failed with auth middleware. Write integration tests from the start.
Over-engineering billing before having paying customers. I extracted billing into its own microservice with quota enforcement, credit tracking, and webhook handlers — before a single customer had paid me anything. That entire service could have been a simple Polar.sh checkout link and a boolean flag for months.
Not building in public sooner. I had a compelling story the entire time — solo dev, 6 microservices, AI coding partner — and I told nobody. Every architectural decision, every production bug, every late night debugging session was content I never published. Starting this now, months after launch, instead of from day one.
Multi-tenant from day one. Already said it. Worth repeating.
Building a custom tracing service. Every LLM call, every RAG query, every API request across all services gets traced to a centralized dashboard. When you're solo, you can't afford to spend hours hunting bugs across microservices. The tracing service with its cost tracking (24 recording points) paid for itself within the first week of production debugging. I can see exactly which step in a multi-service pipeline failed, what it cost, and how long it took. Same thinking led me to build a test dashboard that orchestrates and monitors test runs across all repos from a single UI — when you don't have a QA team, you build one. If I were starting over, these developer tools would be among the first things I'd build.
Config-as-code for tier limits. All subscription tier configs live in YAML files in the shared SDK. No database queries, view only on platform admin panels. Change YAML, deploy, done. Every service reads the same config, so they can't disagree about what a "Plus" plan includes.
Building Saba — the platform's own AI assistant. Saba is a meta-feature: an AI assistant that knows the platform itself. It answers customer questions about their account, usage, billing, agent configuration — using the same agent and RAG infrastructure that powers the customer-facing chatbots. It composes tools dynamically (subscription lookup, usage stats, knowledge search, configuration help) based on the question. Essentially, the platform eats its own dog food. Customers get instant self-service support, and I don't have to be available 24/7 for basic questions. For a solo founder, that's not a nice-to-have — it's survival.
Fire-and-forget for secondary operations. Tracing, analytics, audit logging — none of these block the user's request. Try the operation, log a warning if it fails, move on. User experience is sacred.
The system is live in production. The focus has shifted from building features to finding customers and creating content. The backlog has things like a public REST API and CRM integrations, but those are driven by customer requests, not my desire to build more things.
The hardest part of building a SaaS alone isn't the code. It's the context-switching — code, infrastructure, support, content, growth — all in the same day, every day. The system is the easy part. The business is the hard part.
If you want to see what a solo-built enterprise AI platform looks like:
There's a free tier if you want to spin up an agent and test it yourself.
What's the hardest architectural decision you've faced as a solo developer? The one where both options seemed reasonable and you just had to pick? Would love to hear your stories.
The over-engineered billing mistake resonates hard. I built a full multi-currency invoicing system for a bookkeeping tool before I had a single user. Could've been a Stripe checkout link for months.
Your config-as-code approach for tier limits in YAML is something more people should steal. I've seen so many SaaS apps where pricing tier logic is scattered across the codebase in random if-statements, and when it's time to change plans it's a terrifying deploy. Having one source of truth in a shared config is such a simple win.
The "fire-and-forget for secondary operations" pattern is the right call but I'd add one nuance from my experience: make sure you have a dead letter queue or at least a daily summary of those swallowed errors. I ran fire-and-forget on my analytics pipeline for 3 months before realizing 15% of events were silently failing due to a schema mismatch. The user experience was fine but my analytics were lying to me.
To your closing question: hardest solo dev decision for me was whether to keep everything client-side (for privacy) or add a backend for better features. Went client-side, which became a moat — small business owners processing financial data love that nothing leaves their browser. But it severely limits what you can build. Still not sure it was the right call long-term.
Great point on the dead letter queue — we actually do have failure tracking on the fire-and-forget events for some cases (the worker retries only on failure). But your 15% silent failure story is a good reminder to audit that regularly. On client-side vs backend: privacy as a moat is underrated, especially for financial data. The constraints force creative solutions.
The part about not building in public hit me. You had the most compelling story the whole time - solo dev, 6 microservices, AI as a coding partner - and nobody knew. I'm making the same mistake right now with my own build and this is the reminder I needed to just start talking about it.
Appreciate that! Yeah, building in silence for months was comfortable but a mistake in hindsight. The feedback from posts like this is already shaping the roadmap. Start sharing — even a "here's what I built this week" post goes a long way. What are you building?
Hi, I'm Ritesh, a student from India learning AI and startups. I'm here to learn and help wherever possible.
Welcome, Ritesh! Best way to learn is to build. If you're into AI + startups, pick a small problem and ship something end-to-end — you'll learn more than any course. Good luck!
The .claude/ directory approach is interesting. I've been using something similar — keeping architecture docs that the AI can reference across sessions — and it makes a massive difference vs starting from scratch every time. Without it you spend half your session re-explaining the codebase.
Your mistakes section is the most valuable part of this post honestly. The over-engineered billing thing is such a common trap. I did something similar with a permissions system once — built this whole RBAC thing with hierarchical roles before I had more than 3 users. Could've been a simple if-statement for months.
The fire-and-forget pattern for tracing/analytics is smart. I've seen too many apps where a logging failure takes down the actual user request. Seems obvious but tons of people get it wrong.
Curious about the Claude Code workflow specifically — how do you handle when it generates code that doesn't match your multi-tenant patterns? Do you catch that in code review or do the reference docs prevent it most of the time?
This was a real problem early on. The .claude/references/ docs help a lot, but the biggest fix was adding explicit rules in CLAUDE.md: "Every database query MUST include org_id filter" and "Use TenantAwareRepository from shared SDK." When Claude still misses it, the pattern is usually in a new endpoint where it copies from a non-tenant-aware example. I catch those in review. The RBAC parallel is spot on — I built a full permission system before anyone needed it beyond basic admin/member roles.
dude the claude code workflow is genuinely a cheat code for solo devs. ive been using basically the same setup to build 3 ios apps simultaneously and it remembers context between sessions which is wild
the over-engineering billing mistake is SO relatable. i built this whole subscription management system for my apps before i even had 10 users. stripe checkout + a boolean wouldve been fine for months lol
quick q - with 6 microservices how do you handle the context window? i find it struggles when the codebase gets really big across multiple services. do you work on one service at a time or somehow give it the full picture?
thanks! yeah the context window is the #1 challenge with multi-repo setups.
my approach: i maintain a .claude/references/ directory with curated markdown docs covering architecture, features, and patterns across all services. these aren't auto-loaded — claude reads them on-demand based on what area it's working in. so if it's fixing a RAG bug, it pulls in the RAG pipeline docs; if it's touching billing, it grabs the billing docs. keeps the context focused.
the CLAUDE.md file acts as an index/router — it has a "quick decision tree" that maps task types to repos and a reference docs index so claude knows which doc to read for what. that way it doesn't need the full picture of all 10 repos at once, just the relevant slice.
for cross-service work i'll spin up parallel explore agents to search multiple repos simultaneously, then synthesize. but honestly 80% of tasks are scoped to 1-2 services.
Did you got your first paid client ?
The "documentation for your AI, not for humans" point is underrated. I've seen this pattern too — the quality of AI coding output is directly proportional to how well your codebase context is documented. It's basically prompt engineering at the repo level.
Also +1 on over-engineering billing. It's the most common premature optimization I see on IH. Stripe checkout + a boolean beats a billing microservice until you have actual paying users complaining about actual billing problems.
To your closing question: my hardest solo dev decision was always "build the developer tooling now or ship the feature." Sounds like you'd say tooling first (tracing, test dashboard). Curious — how long did the tracing service take to build, and at what point did it start saving more time than it cost?
Really appreciate the thoughtful comment!
"Prompt engineering at the repo level" — that's a much better way to put it. Stealing that phrase.
On billing — you're right, and I probably fell into that trap myself. The billing service is fairly involved (Polar.sh, quota tracking, credits). A Stripe checkout + boolean would've gotten me to first paying customers faster. Fair point.
On the tooling question — the tracing service took about 2-3 days to build, started paying off almost immediately. First time I had to debug a request flowing through 4 services, having one dashboard showing the full chain with timings and costs vs SSH-ing into logs across containers — night and day difference.
But the real ROI isn't time saved per session, it's confidence. When you're solo and something breaks, knowing you can pinpoint the exact failing step in under a minute changes how aggressively you ship.
That said — I'd still tell someone else to ship the feature first and add tooling when the pain is real. I just felt that pain early because microservices multiply debugging complexity fast.
"Not marketing from day one" - every technical founder's mistake, including mine.
Bikash, the architecture is impressive, but this line hit harder: "The product was production-ready long before anyone knew it existed."
Currently making the same mistake with book-digest.com (AI book summaries). Spent 4 months perfecting the summarization pipeline, adding 6 languages, building admin panels. Started marketing in month 5.
Your multi-tenant-from-day-one decision is the kind of architectural choice that saves you 6 months of pain later. Retrofitting that would be a nightmare.
Ha, glad that line resonated — it's painfully true. 4 months perfecting before marketing is exactly the trap. The engineering brain says "just one more feature" while the market doesn't even know you exist yet.
book-digest.com sounds cool — 6 languages is no joke for a summarization pipeline. Curious what LLM you're using under the hood?
And yeah, multi-tenant from day one was the best decision I made. org_id on every query, tenant-aware repositories, isolated configs — boring plumbing that pays off massively when you onboard your second customer and nothing breaks.
Impressive architecture, Bikash! The 6-service split makes a lot of sense for this kind of multi-channel platform.
Curious about your experience with Claude Code as a co-developer — at what point did you find it most valuable? I have been building a desktop tool for managing AI coding agents (orchestrating Claude Code, Codex CLI, etc. in parallel) and the persistent context / memory aspect has been the biggest unlock. Did you find Claude Code was better at certain types of services vs. others?
Also, the <11KB widget is wild — Preact with Shadow DOM isolation is such a solid choice for embeddables. How is latency on the RAG hybrid search in production?
Thanks!
Claude Code as co-developer: Most valuable on backend/infrastructure — RAG pipeline, billing extraction, webhook handling. It holds full service context while you iterate, which is huge for interconnected logic. Less strong on frontend polish (Next.js), but gets you 80% there. The persistent context via CLAUDE.md files is key — each session picks up without re-explaining the architecture.
Widget: Shadow DOM was non-negotiable for style isolation. Preact + Vite IIFE mode with inlined CSS-in-JS means zero external requests beyond the single script tag.
RAG latency: Hybrid search with reranking typically under 2s end-to-end. Redis caching on similar queries helps a lot. Contextual enrichment during ingestion (chunk-level summaries) was the biggest win — better relevance means fewer chunks per query, which actually reduced latency.
Your parallel agent orchestration tool sounds interesting — would love to hear more!
This is very interesting, I am trying to do something similar. I especially love your idea about cost accounting. I will do that now.
Cheers mate, best wishes.
This is quite impressive.
The thing that i've noticed is that after a launch there is many iterations before customers really stabilizes
What are the most evolving features for you?
Great question. The RAG pipeline has gone through the most iterations by far — started with basic vector search, now it's hybrid search with query decomposition, reranking, and confidence scoring. Every time real users hit it with edge cases, something needed fixing. The agent orchestration layer is second — especially around context window management for long conversations.
I actually just wrote a deep dive on the RAG pipeline iterations specifically — the 7 things that changed from tutorial to production: https://dasbikash.substack.com/p/rag-in-production-is-nothing-like
What's your experience with post-launch iteration cycles?
Impressive build — 6 microservices solo with AI pair programming is no joke.
The "not marketing from day one" and "over-engineering billing before paying customers" hit home. I made the same $20K mistake: built for 3 months, zero sales.
Now I'm focused on validation before code. You mentioned REST API and CRM integrations are backlog items "driven by customer requests" — how are you validating which integration (Salesforce? HubSpot? Slack?) is worth the engineering effort before a customer explicitly asks for it?
I'm testing a method to validate B2B features with video prototypes — show the workflow, test demand, before building the microservice.
Would love to hear how you're tackling prioritization now, or if you'd be interested in testing this approach on your next integration.
Honest answer: I didn't validate well enough before building. The REST API and CRM integrations are sitting there unvalidated. What I've started doing is writing the docs/landing page copy first — if I can't explain why someone would want it in 2 sentences, it probably shouldn't be built yet. Video prototypes sound like a smarter approach. Would be interested to compare notes on what works.
"Docs first" is a good filter, but you're right — video prototypes validate the demand, not just the clarity.
Since you mentioned REST API/CRM integrations are sitting there unvalidated — want to test this?
I can build a 3-minute video showing "Salesforce integration workflow" for Cuneiform Chat (or whichever integration you're considering), you show it to 5-10 prospects, and we see if they'd actually pay for it before you build the microservice.
Takes me 3 hours, costs me €50 (AI video generation), you get the validation data.
If it works, you have proof of demand. If it doesn't, you just saved weeks of microservice architecture.
Interested? Email me: [email protected] and I'll send examples of what this looks like.
Either way, impressive build — 6 microservices solo is serious work.
Appreciate the offer! The video prototype approach is clever for demand validation. I'm not prioritizing CRM integrations right now — focused on core platform and early customer feedback first. But I'll keep it in mind if I need to validate a specific integration down the line.
Cheers!
Totally makes sense — core platform first, integrations later. Smart sequencing.
If you ever hit a point where you're debating "should we build X before Y" for the core platform, happy to help you test that with a quick video prototype. No commitment, just a 3-hour validation sprint.
Either way, following Cuneiform's journey. Good luck with the early customers — that's where the real learning happens!
Cheers
The multi-tenancy decision is fascinating. I love how you articulated the tradeoff: "every feature takes longer" but retrofitting would be "far more expensive." That's the kind of architectural bet that separates production-grade SaaS from side projects.
Your point about building Saba (the AI assistant for your AI platform) is brilliant - dogfooding at its finest. For solo founders, automating tier-1 support isn't just nice-to-have, it's survival like you said.
Hardest architectural decision I faced: whether to build a custom permissions system vs using a hosted auth provider like Auth0. The custom route gave us fine-grained control but cost us weeks. In hindsight, starting with the hosted solution and migrating later might have let us validate the core product faster.
The tracing service you built sounds like it paid for itself immediately. Observability is usually an afterthought for solo devs, but you made it first-class. Smart move.
The permissions tradeoff is real. I went custom too and it delayed getting to market. In hindsight, starting with Auth0 + basic roles and only customizing when a paying customer needed could have been a viable approach. The tracing service has been worth every hour though — when something breaks in a 6-service pipeline, having centralized traces with cost tracking is the difference between debugging for 10 minutes vs 2 hours. What did you end up going with for auth?