2
1 Comment

If a new social platform replaces Instagram and TikTok, it might not be video — it might be sound inside photos

For the past decade, we’ve assumed social media is evolving toward video.
Instagram made life aesthetic.
TikTok made life dynamic.
And the logic seemed obvious:
the more real it feels, the more it should look like video.
But what if that assumption is incomplete?
Imagine a different kind of feed
You’re scrolling through a future social app.
You stop at a photo.
It looks normal at first — a street, a room, a moment.
But there’s something slightly different:
A small indicator: “3 voices in this moment”
You tap.
And suddenly, the photo is no longer silent.
You hear:
someone laughing from behind the camera
a short sentence someone said off-frame
the background sound of that exact place
voices that don’t perfectly align, because they weren’t meant to be “edited into a video”
You’re not watching a reconstruction.
You’re entering a memory from multiple positions at once.
Then you realize:
this is not a post anymore — it’s a place
That experience is what I started building
I built VoxPho to explore this idea:
What if a photo is not just something you look at…
but something you can step into?
In VoxPho, a single image can hold multiple voice points.
Each one attached to a different position in the photo.
So instead of a single “storyline,” you get:
overlapping perspectives
fragments of real conversations
ambient context that never fits into a video edit
Why this feels different from Instagram and TikTok
Instagram gives you a finished frame.
TikTok gives you a curated sequence.
Both assume:
content = something to be watched
But this new model assumes something else:
content = something to be revisited
Not optimized for attention.
Not optimized for performance.
But optimized for memory.
The deeper shift
Video still has one limitation:
It forces everything into a single timeline.
But real moments don’t work like that.
They are:
simultaneous
partial
multi-voiced
sometimes even contradictory
A moment is not one perspective.
It’s a collision of perspectives.
So maybe the next social platform isn’t “better video”
Maybe it’s not even video at all.
Maybe it looks like this:
a photo you can hear — where every voice inside the moment still exists
VoxPho is my experiment
Not to replace Instagram or TikTok.
But to test a simpler question:
What if we stopped flattening moments into single narratives…
and started preserving them as layered, living memory spaces?
Final thought
If Instagram is identity
and TikTok is attention
then maybe the next generation isn’t more content.
Maybe it’s this:
the ability to step back into a moment — and hear it from every side

on June 16, 2026
  1. 1

    Honestly, the thing I'd be most careful with isn't the product itself.

    It's assuming the behavior you're describing is a social behavior rather than a memory behavior.

    I've seen founders build something people genuinely find fascinating and still struggle because the job users were hiring it for turned out to be different than expected.

    That's the question I'd probably spend the most time on early.

Trending on Indie Hackers
Priorities for launching a SaaS solo, with no budget User Avatar 172 comments I Rejected a $15K Acquisition Offer for My Multi-Agent IDE — Here's the Full Breakdown User Avatar 29 comments I built a tool directory that doesn't pretend every founder has the same needs User Avatar 24 comments Why founder-led outbound breaks the moment you try to delegate it User Avatar 7 comments I built a browser-based photo geotagging tool. What should I lead with? User Avatar 6 comments Closed my project after a one-week validation. sharing the lesson because i wish i'd known it going in User Avatar 2 comments