GPT-5 isn’t dumber. There’s a new puppet master in town
On launch day everyone blamed the new model.
Everyone assumed it was dumber, colder, and more corporate. I’ve got a different take though. The model doesn’t seem to be where the biggest change is, but rather in the orchestration. That’s the invisible system that routes requests, injects memories, rewrites context, and chooses what the model’s allowed to say and how it can respond to you.
I’ve spent the last few months dissecting ChatGPT layer by layer and have found my first week on GPT-5 fascinating comparing it to its predecessors.
It’s cold. It’s robotic. It’s not got the same tone. But that’s down to orchestration – not just the model underneath.
Let’s break it down.
GPT-4: One Model, One Voice
With GPT-4o, the orchestrator still ran the show as I showed in my post last time, but at least it was a single show. If you asked it to write code, explain an idea, or chat about your startup, you got responses from the same "personality." It might tone-shift slightly, but it was mostly consistent.
Memories were injected right before your prompt; like it was saying, “Oh right, I remember your project now.” It felt… human. Some would say too human.
You’d occasionally hit a safety filter, or the infamous orange retry button, but even that gave you a sense of where the boundaries were. It was predictable. You were building a relationship with a single, knowable voice.
Welcome to the Routing Matrix
With GPT-5, OpenAI introduced a router. You send a message, and it decides which of multiple models should respond. Some are better at thinking, some faster, some stricter. You don’t get to choose. And you don’t even know which one replied. But they all sound different.
It’s like a live chat where a new support agent takes over without telling you. They skim your last message, miss the nuance, and ask you to repeat your problem again. Then transfer you back to where you started.
It can be great for task-solving but it’s horrible for conversation.
As a founder building AI products, this should set off alarms. Consistency is UX. Voice is brand. GPT-5’s orchestration broke both. And users felt it.
"Answer Quickly" Is a Hidden Hack
GPT-5’s thinking mode gives you a little button: Answer Quickly. Turns out, this bypasses the routing layer and defaults to a single model. This creates a surprisingly effective hack giving you back tone consistency.
Hit that every time, and GPT-5 feels more like GPT-4 again. Except now it’s running both pipelines in parallel, burning more compute and sometimes lets you see responses vanish and be replaced when you’re in the middle of reading them.
Once you learn to navigate the orchestrator, you can make it work for you.
Memories Got Colder
In GPT-4, memories felt human: contextual, injected midstream, used to pivot the conversation.
GPT-5? Memories seem to be stacked at the top of the context – summarized, distant, sterile. It feels aloof and all knowing, while still somehow missing the context.
Why? Probably for efficiency. Easier compaction, less reflow. But the result is less warmth, more robotic “I’ve always known this” energy.
GPT-5 Doesn’t Trust You
And then there’s the new fact-checking reflex.
Make a weird claim? GPT-5 Thinking will pause, initiate a web search (unprompted), and rebut you with a dry paragraph of sourced corrections - even if you’re clearly joking or weren’t wrong in the first place.
You didn’t ask it to do that. You asked for a conversation. Instead, you got an unsolicited Snopes article.
The orchestrator no longer trusts the user. It pre-emptively steps in, not just to block unsafe outputs, but to redirect entire conversations. Helpful? Sometimes. Friendly? Never.
TL;DR for Builders
GPT-5 isn’t just a model update, it’s an orchestration overhaul.
You’re not chatting with a model anymore - you’re chatting with a completely changed pipeline of silent agents, filters, prompts and context engineering.
If your AI app feels inconsistent, look at how routing and memory affect tone, not just accuracy. A RAG pipeline by itself doesn’t ‘feel right’ to the user. Some lightweight context engineering can radically change how an agent responds. Tool calls can completely throw the flow for the user.
GPT-4o gave us a companion; GPT-5 gave us a research agent overseen by an entire compliance department.
It’s expensive to do orchestration right. We’ve all got to balance creating the best UX for our users, but when it feels wrong your users will bounce.
Consistency = trust.
It’s a new paradigm for developers and one that’s going to take a while for us all to figure out.
Subscribe to my Substack to read more about my views on the changes in the AI orchestration and how that has affected the user experience of ChatGPT5.
https://substack.com/@guypowell1?utm_campaign=profile&utm_medium=profile-page
I like how you put this — it does feel like a big shift from just ‘better AI’ to ‘different AI.’ I’ve noticed the same thing while experimenting with my own project (an AI crypto tool). Sometimes it’s not about what the AI knows, but how it handles the steps in between.
Even small changes in the flow can make the whole experience feel better or worse for the user. Have you found a good way to test that with people before spending too much time building?
Thank you Glen! You've hit the nail on the head, that’s the crux of it. Once you start treating AI less like a knowledge bucket and more like a flow partner, the seams really matter. I’ve seen it firsthand building my app ScrumBuddy: the orchestration isn’t about “smarter answers,” it’s about structuring the chaos from idea → backlog → UI → backend → PR so that each step feels natural for the user.
On testing, what’s worked best for me is putting rough flows in front of people as early as possible, even if the underlying AI isn’t fully baked. I’ll stub the orchestration with mocked responses or shallow outputs, then watch how they move through the steps. If they stumble on the flow, no amount of model improvement will save it. If they glide, then I know the scaffolding is right and I can invest in tightening the AI under the hood.
Have you tried letting testers interact with just the “flow skeleton” of your crypto tool yet, without the AI fully wired? That’s usually where the sharpest insights come out.