Hi everyone,
I founded NLP Cloud (https://nlpcloud.com) around 4 years ago. It's an AI API that I propose as an OpenAI alternative for users who want a special focus on privacy, ease of use, and quality of support.
I would like to tell you a bit more about my tech stack. This stack hasn't changed much since the beginning actually, except that I now handle hundreds of AI models instead of only a couple of models at the beginning.
The key here is container orchestration. My whole stack is based on Docker containers. Each AI model is deployed within a Docker container on a specific GPU server. My initial orchestrator was Docker Swarm. Now it's Kubernetes. I would not necessarily recommend Kubernetes for new startups though as it might be unnecessarily complex.
In front of all these containers, I use a load balancer based on Traefik to route the customer requests to the right AI models. Traefik's documentation is a bit cryptic, but apart from that I've never been disappointed by this tool!
In terms of billing, I propose both pre-paid and pay-as-you-go plans. Pay-as-you-go is quite a challenge as I need to carefully meter the users' consumption on my API (number of requests, model used, number of tokens per request...), whithout harming performance. I use a time-series database for that called TimescaleDB. This is basically a PostgreSQL DB optimized for time-series.
As far as the user interface is concerned, I use Python/Django and Go microservices. I also use HTMX on the frontend side.
I must confess that design has never been my main focus for NLP Cloud, but it never prevented the business from growing and today I have a lot of loving customers who don't really seem to care about design. My customers are developers after all: what they want above all is a robust and well documented API that's it.
When I started NLP Cloud, 4 years ago, infrastructure costs were not really a concern. Then LLMs appeared and today my infrastructure costs are huge...! That's why container orchestration and microservices are the key here: it's important to be cloud agnostic in order to get the cheapest GPU servers and remain competitive.
I will be more than happy to answer your questions about this tech stack if you have some!