Quick update for those following the RONTGEN build:
I just shipped two features that let the entire writing workflow run locally — no external API calls, no data leaving your machine.
Local voice-to-text in the browser
The transcription model now runs inside the browser tab. First use downloads and caches the model from our servers. After that, everything happens on your hardware.
WebGPU acceleration is supported on Apple Silicon (Mac Metal) and modern NVIDIA/AMD GPUs. Falls back to CPU on any machine — slower, but functional, and works completely offline once cached.
Available on all plans, including free. No API key required.
Ollama support in agent pipelines
If you already run Ollama locally, RONTGEN now detects it automatically. Any model you've pulled — LLaMA, Mistral, Gemma, Phi — shows up in the agent selector next to cloud models.
You can chain a full pipeline: record audio → transcribe in-browser → post-process with a local model → formatted document in the editor. Zero external calls.
One rule: pipelines are either fully local or fully cloud. No mixing. Keeps things predictable and avoids accidental data leakage.
Why I built this
Two reasons. First, privacy. I work with medical reports. A lot of RONTGEN users are in healthcare, legal, or research — fields where sending audio to third-party servers is a real concern. Local processing removes that friction entirely.
Second, hardware. A lot of people have Apple Silicon Macs or gaming PCs with GPUs that are idle most of the time. Running local AI on that hardware is genuinely fast and costs nothing per request.
The cloud providers are still better for accuracy and speed on most hardware — but for the right user and the right workflow, local is the right call.
Full writeup: https://rontgen.app/blog/run-ai-locally.html
Happy to answer questions about the implementation if anyone's curious — the Transformers.js + WebGPU path had some interesting edge cases.