Voice Tables by Inithouse: Replacing Forms With a Microphone, Two Months of Real Numbers

The thesis behind Voice Tables (https://voicetables.com) is simple: most people who need databases hate building them. Field workers, craftsmen, sales reps, event planners. They need structured data. They don't need a spreadsheet interface.

We asked: what if you could just say "new job, client Martin, address Korunni 42, bathroom renovation, starts July 3" and get a populated row in a table? Whisper transcribes, an LLM parses the intent and maps it to columns, the row appears. No forms, no dropdowns, no fat-fingering cells on a phone screen.

That's what we're building at Inithouse. Our studio ships a growing portfolio of products in parallel, all at different stages of finding product-market fit.

The pipeline

The core is a Whisper + LLM function-calling chain. Audio hits the browser's MediaRecorder API, goes to Whisper for transcription, then an LLM with function-calling extracts structured fields from the transcript. The schema comes from the table definition the user already created. If you have columns [client, address, job_type, start_date], the LLM maps the spoken input to those fields automatically.

We went with browser-native audio capture instead of a native app. The bet: a PWA that works offline and syncs later covers 80% of the field-worker use case without App Store friction. Whether that holds remains to be seen.

What the numbers actually look like

Two months in. Around a dozen signups. 14 tables created across those accounts. Voice input usage is lower than we expected, which tells us one of three things: the mic button isn't prominent enough in the UI, the onboarding doesn't make voice the obvious first action, or people come for the "AI workspace" positioning and default to typing because that's what they know.

All three are testable. Our next sprint is a redesign that puts the microphone front and center, above the table, with a prompt for first-time users that says "just talk to add a row."

Zero paid subscribers on the $19/mo Plus tier so far. Expected at this volume. The conversion question matters though: is voice input the feature that gets people to pay, or is it the real-time collaboration and offline sync? A dozen signups isn't the sample to answer that.

What we got wrong early

We built 11 solutions pages targeting different personas: Craftsmen, Sales Reps, Real Estate, Freelancers, Small Business, Event Planners, Fitness Coaches, Consultants, Students, Creators. The logic was that voice input helps anyone who logs data on the go. True in theory. Terrible for positioning. When you target 11 personas, your landing page speaks to none of them.

Compare that with how we approached Pet Imagination (https://petimagination.com): one use case (AI pet portraits), one audience (pet owners), one action (upload photo, pick style, download). Clean signal, clear conversion path.

Voice Tables has 11 personas fighting for the same landing page real estate. The next experiment is narrowing to craftsmen and field workers specifically: the people who literally can't type because their hands are dirty or full.

The voice-first thesis, bigger picture

We see voice-first interfaces as a category shift for B2B tools, not a feature. The smartphone keyboard is 2008 technology. LLMs can now parse freeform speech into structured data reliably enough to replace forms in low-stakes entry scenarios: job logging, inventory counts, client notes. That's a meaningful wedge if the UX is right.

We're testing this at Voice Tables, but the pattern has implications across our portfolio. At Be Recommended (https://berecommended.com), users already interact with AI outputs in conversational format. The step from reading AI output to speaking AI input is smaller than most builders realize.

Decision framework: when voice-first makes sense

Not every product should be voice-first. From what we've observed, voice input beats typing when:

The user's hands are occupied (field work, cooking, driving)
The data is freeform but needs structure (converting notes to rows, not filling row templates)
The environment is mobile-first and the input is short, under 30 seconds of speech
The cost of a wrong parse is low (you can see the row and correct it quickly)

When precision is critical (financial data, code), or when the input is already highly structured (dates from a picker, predefined dropdowns), forms still win.

What's next

Three things in July:

Mic-first onboarding: new users see the microphone before they see the table grid
Narrowed persona: one craftsmen landing page, remove the other 10 solutions pages temporarily
Granular voice tracking: event logging at each step of the voice-to-row flow so we know exactly where drop-off happens

If you're building anything voice-first, or have tried adding voice to an existing tool, we'd genuinely like to hear how it went. The established playbook for voice UX in SaaS basically doesn't exist yet.

Building this in public at Voice Tables (https://voicetables.com).