GPT-5.4: Steerable Reasoning, Computer-Use Agents, and the Million-Token Context Shift - March 30, 2026 Artwork

DX Today | No-Hype Podcast & News About AI & DX

The DX Today Podcast: Real Insights About AI and Digital Transformation

Tired of AI hype and transformation snake oil? This isn't another sales pitch disguised as expertise. Join a 30+ year tech veteran and Chief AI Officer who's built $1.2 billion in real solutions—and has the battle scars to prove it.

No vendor agenda. No sponsored content. Just unfiltered insights about what actually works in AI and digital transformation, what spectacularly fails, and why most "expert" advice misses the mark.

If you're looking for honest perspectives from someone who's been in the trenches since before "digital transformation" was a buzzword, you've found your show. Real problems, real solutions, real talk.

For executives, practitioners, and anyone who wants the truth about technology without the sales pitch.

All Episodes

DX Today | No-Hype Podcast & News About AI & DX

GPT-5.4: Steerable Reasoning, Computer-Use Agents, and the Million-Token Context Shift - March 30, 2026

March 30, 2026

0:00 | 11:20

Send us Fan Mail

Join Chris and Laura for a deep-dive conversation into OpenAI's GPT-5.4 release and what it means for the AI ecosystem.Today's Topic:GPT-5.4 isn't just another model upgrade — it introduces steerable reasoning, native computer-use capabilities, tool search that cuts token usage by 47%, and a million-token context window. Chris and Laura break down what's actually new, why agentic AI is now a first-class product category, and what builders should be paying attention to.The DX Today Podcast brings you daily deep dives into the most consequential stories in the AI ecosystem.Hosted by Chris and Laura.

SPEAKER_01 0:00

Welcome to the DX Today Podcast, your daily deep dive into the AI ecosystem. I'm Chris, and joining me as always is Laura.

SPEAKER_00 0:07

Hey Chris, I'm excited for this one because it's the kind of release that changes how people actually work day-to-day with AI.

SPEAKER_01 0:13

Same. One topic today, OpenAI's GPT 5.4. Not as a new model dropped headline, but what's conceptually new here: steerable reasoning, agentic workflows, and this claim of native computer use plus huge context.

SPEAKER_00 0:28

Exactly. It's like they're saying, stop thinking of the model as a chat buddy, start thinking of it as a coworker that can plan, use tools, and keep track of a long job.

SPEAKER_01 0:38

Let's anchor the basics. GPT 5.4 was announced March 5th, 2026, and OpenAI frames it as their most capable and efficient frontier model for professional work across reasoning, coding, and agentic workflows.

SPEAKER_00 0:52

And importantly, it's not only an API release. In ChatGPT, the flagship is GPT 5.4 thinking. It replaces GPT 5.2 thinking for paid tiers, and there's a three-month runway where 5.2 stays available as a legacy option before retirement.

SPEAKER_01 1:08

Okay, here's the first thing I want to unpack: steerability. OpenAI's pitch is that for longer, complex tasks, GPT 5.4, thinking can give you an upfront plan, like a preamble to how it intends to solve the problem, and you can redirect it mid-response.

SPEAKER_00 1:25

That sounds small, but it's a big interaction design change. Traditionally, you prompt, it answers, and if it goes off track, you correct it after the fact. This is like you can interrupt the train while it's still laying track.

SPEAKER_01 1:38

Devil's advocate, is this just it prints an outline first, or does it actually change outcomes?

SPEAKER_00 1:45

I think it changes outcomes because it changes user behavior. If the model shows you a plan early, you catch the wrong assumption in step one instead of after it's built a whole spreadsheet, or a whole analysis, or a whole code base edit.

SPEAKER_01 1:59

That's huge for professional work. Misalignment is expensive when you're asking for a multi-step output.

SPEAKER_00 2:04

Also, OpenAI explicitly says GPT 5. That's a big claim. Fewer corrective turns, more accurate emoater first pass work.

SPEAKER_01 2:21

Let's talk about the other pillar. Agentic workflows. OpenAI says GPT 5.4 brings together advances in reasoning, coding, and agentic workflows. And it incorporates the coding capabilities of GPT 5.3 codex.

SPEAKER_00 2:36

Translation, they're merging the strong coder model DNA into the mainline reasoning model. That's consistent with what a lot of teams want. One model that can reason about a task, write code, run the code, inspect results, and keep going.

SPEAKER_01 2:51

That run the code part matters because a genic systems aren't just writing text, they're operating in an environment.

SPEAKER_00 2:59

Right. An open AI calls out computer use as a native capability in the API and codex. They describe it as the first general purpose model with state-of-the-art computer use capabilities.

SPEAKER_01 3:11

When I hear that, I think screenshots, mouse clicks, keyboard commands, web apps, file systems, AI acting like a junior operator.

SPEAKER_00 3:19

That's basically what they're hinting at. They specifically mention mouse or keyboard actions based on screenshots and using automation libraries like Playwright.

SPEAKER_01 3:28

Let's pause and ask, what problem is OpenAI solving with native computer use instead of just tool APIs?

SPEAKER_00 3:34

Great question. Tool APIs are clean when they exist, but the real world is messy. So much enterprise work is locked behind UIs. Internal dashboards, weird vendor portals, PDFs. Download this CSV, then re-upload it. Computer use agents can navigate those last mile gaps.

SPEAKER_01 3:52

So computer use is like a compatibility layer for the whole internet and all internal software.

SPEAKER_00 3:57

Exactly. It's brittle sometimes, but it's universal. If a human can do it, in principle, an agent can do it.

SPEAKER_01 4:04

And then there's the headline context claim. Up to 1M tokens of context and API codex.

SPEAKER_00 4:09

That's the long job enabler. If you want an agent to run a multi-hour task, read a big repository, digest documents, keep notes, compare versions, context becomes the bottleneck.

SPEAKER_01 4:21

But 1M tokens doesn't mean it remembers everything equally. It means it can ingest a lot. The real question is, can it retrieve and reason over the right parts under pressure?

SPEAKER_00 4:33

True. And OpenAI includes long context evaluations, but I'd frame it like this: long context isn't magic memory. It's more like giving the agent a bigger desk to spread out papers.

SPEAKER_01 4:45

Nice analogy. Bigger desk helps, but you still need good organization.

SPEAKER_00 4:48

And OpenAI tries to address organization via another feature, tool search. They talk about agents working across large ecosystems of tools and connectors, and the model being able to find the right tool efficiently.

SPEAKER_01 5:01

That's interesting because once you have dozens or hundreds of tools, the which function do I call problem becomes its own cognitive load.

SPEAKER_00 5:09

Exactly. And tool search is like letting the agent browse a menu instead of forcing you to hand it the entire menu every time?

SPEAKER_01 5:16

Okay, now the part everyone argues about metrics. OpenAI says GPT 5.4 is more token efficient than GPT-5.2, using fewer tokens to solve problems. And that translates into faster speeds and lower cost.

SPEAKER_00 5:31

The token efficiency claim is important because models keep getting more capable, but also more expensive. If they can do more with fewer tokens, it's basically a performance per dollar story.

SPEAKER_01 5:42

They also claim improved factuality. Individual claims 33% less likely to be false, and full responses 18% less likely to contain any errors relative to GPT-5.2 based on internal measurement.

SPEAKER_00 5:55

That's a big deal for eugenic workflows. A hallucination in a chat answer is annoying. A hallucination in a tool using agent can be catastrophic. Wrong email sent, wrong data pasted, wrong script run.

SPEAKER_01 6:08

Right. Compounding errors. In an agent loop, small failure rates multiply.

SPEAKER_00 6:14

Another interesting benchmark angle, they highlight improvements on BrowseComp, which is basically the model's ability to do deep web research for specific queries.

SPEAKER_01 6:22

Which connects back to the steerability idea. If it's researching over multiple rounds, you want the ability to steer before it goes down a rabbit hole.

SPEAKER_00 6:31

Totally.

SPEAKER_01 6:31

Now, safety. This is where it gets nuanced. OpenAI talks about chain of thought monitorability and introduces an open source evaluation called co-T controllability, measuring if models can deliberately obfuscate reasoning to evade monitoring.

SPEAKER_00 6:47

And they claim GPT-5.4, thinking's ability to control or hide its chain of thought is low, which they present as a positive safety property.

SPEAKER_01 6:56

Here's my pushback. What does a low controllability actually mean in practice? That the model can't intentionally fake its reasoning trail?

SPEAKER_00 7:04

That's the implication. It's basically if you're monitoring reasoning traces for risky behavior, a model that can't strategically hide its reasoning is easier to supervise.

SPEAKER_01 7:13

But there's a separate question. Do we actually get to see the full chain of thought in products? Often we don't.

SPEAKER_00 7:20

Right. The safety framing could be more about internal monitoring systems than user-visible text.

SPEAKER_01 7:26

Another safety note from OpenAI. They treat GPT 5.4 as high cyber capability under their preparedness framework and deploy it with additional protections.

SPEAKER_00 7:36

That's the dual use reality. If the model is stronger at operating tools and computers, it can be more helpful for defense and also more dangerous for offense.

SPEAKER_01 7:45

So the core tension is make agents powerful enough to do real work, but keep them from being powerful in the wrong direction.

SPEAKER_00 7:52

Exactly. And a lot of this release is about making agents more reliable, more grounded, more tool accurate, more context aware.

SPEAKER_01 8:00

Let's shift into how should a normal team actually use this? Because people hear steerability and might not know what to do differently.

SPEAKER_00 8:08

Practical pattern, ask for the plan first, then approve or edit it. You can literally say, propose a plan and wait. That turns a monolithic prompt into a two-phase workflow.

SPEAKER_01 8:18

That's basically project management for an AI.

SPEAKER_00 8:21

Another pattern, define confirmation policies. If an agent is going to click buttons in a UI, you often want ask before irreversible actions. OpenAI mentions configurable safety behavior with custom confirmation policies.

SPEAKER_01 8:36

So like draft the email, but don't hit send.

SPEAKER_00 8:38

Exactly. Or identify the file to delete, but require approval.

SPEAKER_01 8:42

If you're building an agent, you also need observability. Logs of actions, screenshots, diffs, and checkpoints.

SPEAKER_00 8:49

Yes, because when an agent makes a mistake, you need to diagnose it. Agenic systems need debugging tools like any software system.

SPEAKER_01 8:57

Another question: Does bigger context make prompt engineering obsolete?

SPEAKER_00 9:01

No, it changes it. Prompting becomes less about cramming everything into one message and more about maintaining a clear objective, constraints, and a consistent evaluation loop.

SPEAKER_01 9:11

Like you're designing a workflow, not a sentence.

SPEAKER_00 9:13

Exactly. And the upfront plan is like an interface to that workflow.

SPEAKER_01 9:17

Let's talk about the Pro tier briefly. OpenAI also released GPT-5.4 Pro for maximum performance on complex tasks.

SPEAKER_00 9:25

Which probably means higher compute per answer. If you're doing mission-critical work, like generating a big financial model or a complicated code refactor, you might pay for the extra robustness.

SPEAKER_01 9:36

Now, outside perspective. Some coverage frames this as a competitive move. Open AI, responding to pressure from Anthropic and Google.

SPEAKER_00 9:46

And that's real. But I'd argue the bigger story is the product category shift from chatbots to agents.

SPEAKER_01 9:52

Yes. The release reads like an agent platform announcement. Long context, tool search, computer use, better web research.

SPEAKER_00 10:00

And the core user experience change, steerability, acknowledges something that's been true for a while. People don't want a black box answer. They want a controllable process.

SPEAKER_01 10:10

Okay, skepticism moment. What could go wrong with this mid-response steering concept?

SPEAKER_00 10:15

If users overtrust the plan, a plan can be plausible but flawed. Also, mid-response steering could give a false sense of control if the underlying reasoning still drifts.

SPEAKER_01 10:25

Anogenic computer use, UI automation can be fragile. One layout change, and your agent clicks the wrong thing.

SPEAKER_00 10:32

Exactly, which is why confirmation and verification steps matter. Agents need to read back what they're doing, check results, and fail safely.

SPEAKER_01 10:40

So if we summarize the release in one sentence, GPT 5.4 is open AI betting that the future is controllable. Tool using agents that can operate in real software environments, not just produce text.

SPEAKER_00 10:53

And they're adding three ingredients for that: better reasoning, better coding, and better operational capability. Computer use, tool search, long context.

SPEAKER_01 11:03

Last question, Laura. What's the most underrated part of this release?

SPEAKER_00 11:07

The subtle shift toward workflow design. The model giving an upfront plan is basically inviting humans to become managers of AI work streams. That's a new skill.

SPEAKER_01 11:16

And the most overrated part?

SPEAKER_00 11:18

Probably the raw context number. It's impressive, but real performance depends on retrieval and focus, not just how much you can stuff in.

SPEAKER_01 11:26

That's a good note to end on. Capability is not just scale, it's reliability and controllability.

SPEAKER_00 11:32

Exactly.

SPEAKER_01 11:33

That's all for today's episode of the DX Today podcast. Thanks for listening, and we'll see you next time.