Less than a month ago, ChatGPT released ChatGPT vision.
What it does: Basically you can upload an image and it can answer questions about that image.
There's not yet an API for this feature. There are rumors, however, that OpenAI will soon make some major API announcments, and this will probably be among them.
What you can build: We've all seen the crazy amount of apps based on OpenAI's GPT 4 API. Some of the winners were folks who were the first to make use of that API. Things will probably be the same with GPT-4 vision.
Let's explore some of the things that are possible to build thanks to this new feature, so that you can be ready to rock 'n' roll when the API becomes available:
You can upload an actual picture of a user interface and get GPT-4V to output the HTML/CSS/React code for it:
What you can build: A "reverse UI" decoder tool. You can analyze/decode some of the world's most popular user interfaces. You can charge for people to upload their own designs and convert those designs to HTML.
In other words, this has the potential to replace people who do "Figma to HTML" conversion.
Upload an image from a textbook and GPT-4 Vision will solve the problem:
What you can build: A tool where people upload an image of a textbook, specify the subject and the tool will solve that problem.
You can upload an image of a receipt to GPT-4V and get a bunch of information in return:
This is a Godsent for accountants who deal with a lot of offline data.
What you can build: An (offline) document organizer for accountants. You can customize this to Shopify/Quickbooks and build it as an extension to their platform.
This is an interesting prompt:

What you can build: Imagine scaling this to a whole restaurant. A camera is taking a picture of every table and then asking GPT-4V to calculate the price. Then it sums up everything and it gives the restaurant owner an overview of the total expenses for the day.
GPT-4V is surprisingly good at taking a picture of a food and telling you how to make that food:
What you can build: A "reverse recipe" creator where people can upload their favorite food. If the app gets popular, you can add a functionality so that people can rate the "accuracy" of the recipe, suggest changes, etc.
Take a look at this:
Basically you can upload an image of the gym equipment you have and (potentially) an image of yourself so it tells you what exercises to focus on.
What you can build: A "scrappy" workout planner where you can make workout plans from seemingly simple things people have in their home.
It seems that you can upload ANY picture with a question into GPT-4V and it'll give you the answer in return:
What you can build: A tool where people can upload an image with any question on it (a puzzle, a homework, etc.)
GPT-4V can actually detect scenes from movies:
What you can build: I've seen many YouTube/TikTok videos where people ask something like: "Which movie is this?" You could make a tool that would parse that video, extract a few pictures from it and then ask GPT-4V a question like the one in the tweet.
You can take a picture of a room and ask GPT-4V to suggest any additions to it:
What you can build: A tool that will let people take a picture of their room and suggest improvements. You could then match those improvements with DALL-E 3 and create a visual representation of the room. You could also provide users with similar-looking rooms.
Speaking of DALL-E 3...
You can upload an image to GPT-4V and ask it to create a DALL-E/Midjourney prompt from it:

What you can build: A "reverse prompt" tool where people upload images and they get prompts in return. You can then process those prompt and compare them to the final. You can even create a 'human vs ai' tool where you'd have a bunch of images side-by-side: the original on the left, and on the right a DALL-E/Midjourney image created from a GPT-4V prompt.
GPT-4V can accurate answer questions like these:

What you can build: A travel app where people take a picture of what they're seeing. As an output they'll get a detailed description of the object, alongside with similar objects, their distance from the target object, etc.
GPT-4V can do a good job of recognizing the conditions of the road:

What you can build: An app that connects to someone's dash cam and periodically analyzes the image for any conditions they need to be aware of.
GPT-4V can do a pretty good job of analyzing memes:

What you can build: The reverse. Upload an image and ask it to create a few meme ideas for you.
GPT-4V is pretty good at analyzing cues in an image:

What you can build: A game where a person can try and write all the things they notice about a picture. Then, ask GPT-4V about the same picture and compare their answers with a GPT-4 prompt.
Take a look at this prompt:

It turns out that GPT-4V is pretty good at translating images to code.
What you can build: A code builder that takes a graph/sketch and translates it into code.
You can take pictures that feature stats and convert them to useful answers/insights with GPT-4V:

What you can build: An "insights generator". Get people to choose a niche. Your tool will then search for stats for that niche, feed them into GPT-4V and ask it to generate some unique insights.
Take a look at this GPT-4V prompt:

What you can build: Many images have corresponding text formats. GPT-4V is pretty good at converting to thos eformats. You can build an image-to-text tool that will do this.
This is a pretty interesting GPT-4V prompt:

Basically, you can take a whole page and ask GPT-4V to analyze the news items.
What you can build: A "news summarizer" tool that will take your favorite site and tell you about the "themes", etc.
GPT-4V can be good at predicting emotions from an image:

What you can build: You can use this to analyze frames from an ad and give an "emotion" summary. You can sell this as a competitor research tool.
You can also use GPT-4V to predict what people will find to be more beautiful:

GPT-4 can be pretty good on evaluating the damage something has endured:

What you can build: Anything useful to insurance evaluators.
These ideas are fantastic. I am excited to see how entrepreneurs utilize GPT-4 Vision for diverse applications. Thanks Darko for sharing this insightful piece.
How risky is building a wrapper around this!
Awesome ideas! Thank you for sharing! Funny enough, I’ve actually launched an insight generator back in September :)
Wow, this is brilliant!!!
There's literally endless possibilities!
Certainly, here are 20 concise SaaS ideas using GPT-4 Vision:
Content generation
Code generation
Video storyboards
Graphic design
E-commerce personalization
Virtual fashion stylist
Language translation
Visual content summarizer
Medical image analysis
Architectural blueprints
Virtual tours
Artwork authentication
Automated video editing
Social media scheduling
Food recipe generator
Home organization
Mood-based music playlists
Environmental impact analysis
Sports performance analysis
Document signature verification
These ideas leverage advanced image analysis and text generation capabilities for diverse applications.
Let's try
Great ideas! Here are some I came up with:
Here is the list of another 550+ Products you can built using GPT-4 - https://www.indiehackers.com/post/50-side-projects-making-1m-f51ce727df