Transformers.js and the future of local LLMs

Transformers.js is one of the few AI innovations that truly has the potential to change everything you know about LLMs.

June 28, 2024

When Apple finally announced its entry into AI with “Apple Intelligence,” much of the focus was on its partnership with OpenAI. But the bigger news was that Apple would be turning its foundation models into features directly on devices.

As Benedict Evans puts it, Apple is betting that generative AI is most useful when embedded in a system that gives it broader context about the user rather than a single general-purpose interface product.

In other words, Apple is going local.

This is obviously a dramatic change from AI’s current meta of giant big tech chatbots and fine-tuned wrappers from indie hackers, but it does make sense for all parties involved:

The developer gets to build a product that doesn’t cost a fortune in API calls, isn’t reliant on another company’s servers staying up, will never get rate-limited, and is free from any censorship restrictions.
The user gets a product that better understands them, fits in directly with their workflow, can operate without an internet connection, has better latency, and is free from the privacy concerns associated with cloud-based models.

One of the more popular ways developers are starting to experiment with this trend is Transformers.js.

What Is Transformers.js?

To discuss Transformers.js, we first have to discuss its older sibling, Transformers, the Hugging Face library that enables developers to run pre-trained models locally in Python. It’s an amazing tool, but as you probably know, Python isn’t exactly ideal for web development.

Enter Transformers.js, a library that ports Transformers to JavaScript. The result is that developers can now pull from 979 (and counting) different AI models to build AI applications directly in the browser.

How Does It Work?

The original Transformers library is built with PyTorch, a different Python machine learning library. What PyTorch does is give developers building blocks for their deep learning applications. Developers can then use those blocks to train and use task-specific models like gated recurrent networks, long short-term memory, and transformers.

Transformers.js turns that Transformers library into JavaScript by using the Open Neural Network Exchange (ONNX). ONNX is a way to encode models into an easy-to-exchange standard that can be “saved” and transferred. Once it’s saved and transferred, all you need to do to pull it up is to use the ONNX runtime, which is an engine that can deploy an ONNX model and be used programmatically.

It all adds up to web developers being able to access these models in JavaScript with just a few lines of code.

This Sounds Cool, What Have People Built With It?

Transformers.js can handle tasks across:

Natural language processing: text classification, named entity recognition, question answering, language modeling, summarization, translation, multiple choice, and text generation.

Computer Vision: image classification, object detection, and segmentation.
Audio: automatic speech recognition and audio classification.
Multimodal: zero-shot image classification.

So naturally, there’s been lots of cool projects built with it.

One really recent (like yesterday) example is Florence-2, Microsoft’s new vision foundation model. Transformers.js has made it so that Florence-2 can accomplish tasks like image captioning, optical character recognition, and object detection directly in the user's browser.

Another example is Depth Anything V2, a model that lets you estimate depth in real-time directly in your browser. And, because the smallest model is only ~50MB, you can actually run this model on a browser on your phone.

On the audio front, you have projects like Whisper WebGPU, which supports real-time in-browser speech recognition and multilingual transcription across 100 different languages.

And for the chatbot fans, you can also run models like Phi-3-Mini directly in your browser at 70 tokens per second. Basically, you can have a personal chatbot that is private, fast, and always accessible.

We can go on, but there are literally dozens of examples. If you want to check out more of them (and we definitely recommend doing so), you can scroll through the Hugging Face profile of Xenova, the mastermind behind the project.

Where’s This All Going?

Transformers.js capabilities are still quite limited. Being contained to models that can run in a browser will do that. But it’s clear that the winds of AI are shifting in Transformers’ direction.

This is because AI is getting smaller and smaller. Layer pruning, quantizing weights, and more efficient training strategies create SLMs as powerful as their LLM predecessors. Phi-3, for example, was introduced as a “highly capable language model locally on your phone.” As these smaller models improve, it means that the edge in AI products is changing from data, compute, and research to distribution.

If this is true, then mobile will likely be the main battleground for AI applications in the future, considering its 60.08% market share compared to desktops' 37.85%. Yes, Apple and Google will have their own on-device models. But, in the same way that there was room for outside apps in the pre-AI era, there will be room for outside apps in the AI era.

Transformers.js is well-positioned to take advantage of this shift because it is compatible with mobile. And now is the time to play around with it while it’s still relatively early.

Who knows, it may be like getting on the App Store in 2009.

Stephen Flanders is an Indie Hackers journalist and a professional writer who covers all things tech and startups. His work is read by millions of readers daily and covers industries from crypto and AI to startups and entrepreneurship. In his free time, he is building his own WordPress plugin, Raffle Leader.

Say something nice to StephenFlanders…

Post Comment

1

This comment was deleted 9 months ago

williamoliverhenry

·
9 months ago