Between Meta, Google, Flux, and Nvidia, a slew of new AI models have been released in the past week.
Between Meta, Google, Flux, and Nvidia, a slew of new AI models have been released in the past week.
Some make it cheaper to develop AI apps. Others allow you to create faster applications. Still others enable tasks that were previously impossible with older AI models.
A couple of these models have APIs, so you can start using them right away.
Here's a look at what got released and what it means for indie hackers:
Google announced that their latest Gemini "Flash" variant is now production-ready and comes with:
a 50% reduction in price (compared to 1.5 Flash)
2x higher rate limits
Lower latency on small prompts
Here's how that translates to money:
$0.0375 per 1 million input tokens on prompts <128K
$0.15 per 1 million output tokens on prompts <128K
$0.01 per 1 million tokens on cached prompts <128K
Google released this model last month, but it was an experimental version, and the model wasn't generally available. That's no longer the case.
According to Google product manager Logan Kilpatrick:
“We see the most potential for this model in tasks ranging from high volume multimodal use cases to long context summarization tasks. Gemini 1.5 Flash-8B is best suited for simple, higher volume tasks.”
Here are some examples of what you could create with these capabilities:
A real-time chatbot
A fast news-summarization service
Anything that revolves around simpler, higher-volume tasks (classifying things, separating the words in long strings where no clear algorithmic solutions are possible, etc.)
Flux is one of the world's most popular text-to-image AI models.
Last week, they finally announced an API version. Prices start at 2.5 credits per image, where $1 nets you 100 credits. Flux 1.1 Pro costs four credits per image.
Here are some image samples from the Pro model:
Flux is better than Midjourney and other models when it comes to customisability. Flux has fine tunes, LoRAs and ControlNet.
Last time I checked, a lot of people were using Flux for creating virtual influencers. You could explore this more and maybe build an automated workflow for creating virtual influencers now that the API is available to developers.
NVIDIA has introduced NVLM 1.0, a multimodal LLM that rivals GPT4 in benchmarks:
The key here is open source, meaning you can host it on your own machine.
Here's one use case:
The key here is multi-modality: You can provide an image to a model and it can explain the image or provide you new ideas. For example, you could create:
A database containing frames from TikTok videos, asking the AI to create new angles/hooks for it.
A novel meme generator (provide an image, get text for a meme)
This model is not yet available. However, its capabilities are impressive. This model can generate personalized videos and also generate sounds alongside the video.
So say you have the face of a person. You could create a video of that person being a DJ at a party.
You could also easily modify an existing video, like dress up a penguin or create a dynamic background of a person doing something.
Exciting news! The releases from Meta, Google, Flux, and Nvidia open up great opportunities for developers. Google’s Gemini 1.5 Flash-8B offers cost reductions and higher rate limits, making it accessible for indie hackers to create applications like real-time chatbots and news summarization services. Plus, with Flux’s new API, we can expect innovative text-to-image projects. Can't wait to see what comes next!
Only Gemini offers free access to their API
Flux could already be accessed with an API on Replicate. Now also Flux 1.1 is available: https://replicate.com/black-forest-labs/flux-1.1-pro