Four new AI models just got released. Here are the new things you can build.

Between Meta, Google, Flux, and Nvidia, a slew of new AI models have been released in the past week.

October 7, 2024

Between Meta, Google, Flux, and Nvidia, a slew of new AI models have been released in the past week.

Some make it cheaper to develop AI apps. Others allow you to create faster applications. Still others enable tasks that were previously impossible with older AI models.

A couple of these models have APIs, so you can start using them right away.

Here's a look at what got released and what it means for indie hackers:

Gemini 1.5 Flash-8B is production-ready

Google announced that their latest Gemini "Flash" variant is now production-ready and comes with:

a 50% reduction in price (compared to 1.5 Flash)
2x higher rate limits
Lower latency on small prompts

Here's how that translates to money:

$0.0375 per 1 million input tokens on prompts <128K

$0.15 per 1 million output tokens on prompts <128K
$0.01 per 1 million tokens on cached prompts <128K

Google released this model last month, but it was an experimental version, and the model wasn't generally available. That's no longer the case.

What can you build with this model?

According to Google product manager Logan Kilpatrick:

“We see the most potential for this model in tasks ranging from high volume multimodal use cases to long context summarization tasks. Gemini 1.5 Flash-8B is best suited for simple, higher volume tasks.”

Here are some examples of what you could create with these capabilities:

A real-time chatbot
A fast news-summarization service
Anything that revolves around simpler, higher-volume tasks (classifying things, separating the words in long strings where no clear algorithmic solutions are possible, etc.)

Flux gets an API

Flux is one of the world's most popular text-to-image AI models.

Last week, they finally announced an API version. Prices start at 2.5 credits per image, where $1 nets you 100 credits. Flux 1.1 Pro costs four credits per image.

Here are some image samples from the Pro model:

What can you build with this model?

Flux is better than Midjourney and other models when it comes to customisability. Flux has fine tunes, LoRAs and ControlNet.

Last time I checked, a lot of people were using Flux for creating virtual influencers. You could explore this more and maybe build an automated workflow for creating virtual influencers now that the API is available to developers.

Nvidia released a new open-source model

NVIDIA has introduced NVLM 1.0, a multimodal LLM that rivals GPT4 in benchmarks:

The key here is open source, meaning you can host it on your own machine.

What can I build with this model?

Here's one use case:

The key here is multi-modality: You can provide an image to a model and it can explain the image or provide you new ideas. For example, you could create:

A database containing frames from TikTok videos, asking the AI to create new angles/hooks for it.
A novel meme generator (provide an image, get text for a meme)

A worthy mention: Meta Movie Gen, Meta's new text-to-video model

This model is not yet available. However, its capabilities are impressive. This model can generate personalized videos and also generate sounds alongside the video.

So say you have the face of a person. You could create a video of that person being a DJ at a party.

You could also easily modify an existing video, like dress up a penguin or create a dynamic background of a person doing something.

Darko is a journalist for Indie Hackers and an entrepreneur. He writes about AI and acquisition channels that work for founders. He runs a newsletter called Growth Trends where he curates news items focused on user acquisition and new product ideas.

Say something nice to zerotousers…

Post Comment

1

Exciting news! The releases from Meta, Google, Flux, and Nvidia open up great opportunities for developers. Google’s Gemini 1.5 Flash-8B offers cost reductions and higher rate limits, making it accessible for indie hackers to create applications like real-time chatbots and news summarization services. Plus, with Flux’s new API, we can expect innovative text-to-image projects. Can't wait to see what comes next!

Gavodom

·
a year ago
·
Reply
·
1

Only Gemini offers free access to their API

M Gilang Januar

·
a year ago
·
Reply
1

Flux could already be accessed with an API on Replicate. Now also Flux 1.1 is available: https://replicate.com/black-forest-labs/flux-1.1-pro

Nuno Bispo

·
a year ago
·
Reply