DX Today | No-Hype Podcast & News About AI & DX

GPT-5.4: Steerable Reasoning, Computer-Use Agents, and the Million-Token Context Shift - March 30, 2026

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 11:20

Send us Fan Mail

Join Chris and Laura for a deep-dive conversation into OpenAI's GPT-5.4 release and what it means for the AI ecosystem.Today's Topic:GPT-5.4 isn't just another model upgrade — it introduces steerable reasoning, native computer-use capabilities, tool search that cuts token usage by 47%, and a million-token context window. Chris and Laura break down what's actually new, why agentic AI is now a first-class product category, and what builders should be paying attention to.The DX Today Podcast brings you daily deep dives into the most consequential stories in the AI ecosystem.Hosted by Chris and Laura.
SPEAKER_01

Welcome to the DX Today Podcast, your daily deep dive into the AI ecosystem. I'm Chris, and joining me as always is Laura.

SPEAKER_00

Hey Chris, I'm excited for this one because it's the kind of release that changes how people actually work day-to-day with AI.

SPEAKER_01

Same. One topic today, OpenAI's GPT 5.4. Not as a new model dropped headline, but what's conceptually new here: steerable reasoning, agentic workflows, and this claim of native computer use plus huge context.

SPEAKER_00

Exactly. It's like they're saying, stop thinking of the model as a chat buddy, start thinking of it as a coworker that can plan, use tools, and keep track of a long job.

SPEAKER_01

Let's anchor the basics. GPT 5.4 was announced March 5th, 2026, and OpenAI frames it as their most capable and efficient frontier model for professional work across reasoning, coding, and agentic workflows.

SPEAKER_00

And importantly, it's not only an API release. In ChatGPT, the flagship is GPT 5.4 thinking. It replaces GPT 5.2 thinking for paid tiers, and there's a three-month runway where 5.2 stays available as a legacy option before retirement.

SPEAKER_01

Okay, here's the first thing I want to unpack: steerability. OpenAI's pitch is that for longer, complex tasks, GPT 5.4, thinking can give you an upfront plan, like a preamble to how it intends to solve the problem, and you can redirect it mid-response.

SPEAKER_00

That sounds small, but it's a big interaction design change. Traditionally, you prompt, it answers, and if it goes off track, you correct it after the fact. This is like you can interrupt the train while it's still laying track.

SPEAKER_01

Devil's advocate, is this just it prints an outline first, or does it actually change outcomes?

SPEAKER_00

I think it changes outcomes because it changes user behavior. If the model shows you a plan early, you catch the wrong assumption in step one instead of after it's built a whole spreadsheet, or a whole analysis, or a whole code base edit.

SPEAKER_01

That's huge for professional work. Misalignment is expensive when you're asking for a multi-step output.

SPEAKER_00

Also, OpenAI explicitly says GPT 5. That's a big claim. Fewer corrective turns, more accurate emoater first pass work.

SPEAKER_01

Let's talk about the other pillar. Agentic workflows. OpenAI says GPT 5.4 brings together advances in reasoning, coding, and agentic workflows. And it incorporates the coding capabilities of GPT 5.3 codex.

SPEAKER_00

Translation, they're merging the strong coder model DNA into the mainline reasoning model. That's consistent with what a lot of teams want. One model that can reason about a task, write code, run the code, inspect results, and keep going.

SPEAKER_01

That run the code part matters because a genic systems aren't just writing text, they're operating in an environment.

SPEAKER_00

Right. An open AI calls out computer use as a native capability in the API and codex. They describe it as the first general purpose model with state-of-the-art computer use capabilities.

SPEAKER_01

When I hear that, I think screenshots, mouse clicks, keyboard commands, web apps, file systems, AI acting like a junior operator.

SPEAKER_00

That's basically what they're hinting at. They specifically mention mouse or keyboard actions based on screenshots and using automation libraries like Playwright.

SPEAKER_01

Let's pause and ask, what problem is OpenAI solving with native computer use instead of just tool APIs?

SPEAKER_00

Great question. Tool APIs are clean when they exist, but the real world is messy. So much enterprise work is locked behind UIs. Internal dashboards, weird vendor portals, PDFs. Download this CSV, then re-upload it. Computer use agents can navigate those last mile gaps.

SPEAKER_01

So computer use is like a compatibility layer for the whole internet and all internal software.

SPEAKER_00

Exactly. It's brittle sometimes, but it's universal. If a human can do it, in principle, an agent can do it.

SPEAKER_01

And then there's the headline context claim. Up to 1M tokens of context and API codex.

SPEAKER_00

That's the long job enabler. If you want an agent to run a multi-hour task, read a big repository, digest documents, keep notes, compare versions, context becomes the bottleneck.

SPEAKER_01

But 1M tokens doesn't mean it remembers everything equally. It means it can ingest a lot. The real question is, can it retrieve and reason over the right parts under pressure?

SPEAKER_00

True. And OpenAI includes long context evaluations, but I'd frame it like this: long context isn't magic memory. It's more like giving the agent a bigger desk to spread out papers.

SPEAKER_01

Nice analogy. Bigger desk helps, but you still need good organization.

SPEAKER_00

And OpenAI tries to address organization via another feature, tool search. They talk about agents working across large ecosystems of tools and connectors, and the model being able to find the right tool efficiently.

SPEAKER_01

That's interesting because once you have dozens or hundreds of tools, the which function do I call problem becomes its own cognitive load.

SPEAKER_00

Exactly. And tool search is like letting the agent browse a menu instead of forcing you to hand it the entire menu every time?

SPEAKER_01

Okay, now the part everyone argues about metrics. OpenAI says GPT 5.4 is more token efficient than GPT-5.2, using fewer tokens to solve problems. And that translates into faster speeds and lower cost.

SPEAKER_00

The token efficiency claim is important because models keep getting more capable, but also more expensive. If they can do more with fewer tokens, it's basically a performance per dollar story.

SPEAKER_01

They also claim improved factuality. Individual claims 33% less likely to be false, and full responses 18% less likely to contain any errors relative to GPT-5.2 based on internal measurement.

SPEAKER_00

That's a big deal for eugenic workflows. A hallucination in a chat answer is annoying. A hallucination in a tool using agent can be catastrophic. Wrong email sent, wrong data pasted, wrong script run.

SPEAKER_01

Right. Compounding errors. In an agent loop, small failure rates multiply.

SPEAKER_00

Another interesting benchmark angle, they highlight improvements on BrowseComp, which is basically the model's ability to do deep web research for specific queries.

SPEAKER_01

Which connects back to the steerability idea. If it's researching over multiple rounds, you want the ability to steer before it goes down a rabbit hole.

SPEAKER_00

Totally.

SPEAKER_01

Now, safety. This is where it gets nuanced. OpenAI talks about chain of thought monitorability and introduces an open source evaluation called co-T controllability, measuring if models can deliberately obfuscate reasoning to evade monitoring.

SPEAKER_00

And they claim GPT-5.4, thinking's ability to control or hide its chain of thought is low, which they present as a positive safety property.

SPEAKER_01

Here's my pushback. What does a low controllability actually mean in practice? That the model can't intentionally fake its reasoning trail?

SPEAKER_00

That's the implication. It's basically if you're monitoring reasoning traces for risky behavior, a model that can't strategically hide its reasoning is easier to supervise.

SPEAKER_01

But there's a separate question. Do we actually get to see the full chain of thought in products? Often we don't.

SPEAKER_00

Right. The safety framing could be more about internal monitoring systems than user-visible text.

SPEAKER_01

Another safety note from OpenAI. They treat GPT 5.4 as high cyber capability under their preparedness framework and deploy it with additional protections.

SPEAKER_00

That's the dual use reality. If the model is stronger at operating tools and computers, it can be more helpful for defense and also more dangerous for offense.

SPEAKER_01

So the core tension is make agents powerful enough to do real work, but keep them from being powerful in the wrong direction.

SPEAKER_00

Exactly. And a lot of this release is about making agents more reliable, more grounded, more tool accurate, more context aware.

SPEAKER_01

Let's shift into how should a normal team actually use this? Because people hear steerability and might not know what to do differently.

SPEAKER_00

Practical pattern, ask for the plan first, then approve or edit it. You can literally say, propose a plan and wait. That turns a monolithic prompt into a two-phase workflow.

SPEAKER_01

That's basically project management for an AI.

SPEAKER_00

Another pattern, define confirmation policies. If an agent is going to click buttons in a UI, you often want ask before irreversible actions. OpenAI mentions configurable safety behavior with custom confirmation policies.

SPEAKER_01

So like draft the email, but don't hit send.

SPEAKER_00

Exactly. Or identify the file to delete, but require approval.

SPEAKER_01

If you're building an agent, you also need observability. Logs of actions, screenshots, diffs, and checkpoints.

SPEAKER_00

Yes, because when an agent makes a mistake, you need to diagnose it. Agenic systems need debugging tools like any software system.

SPEAKER_01

Another question: Does bigger context make prompt engineering obsolete?

SPEAKER_00

No, it changes it. Prompting becomes less about cramming everything into one message and more about maintaining a clear objective, constraints, and a consistent evaluation loop.

SPEAKER_01

Like you're designing a workflow, not a sentence.

SPEAKER_00

Exactly. And the upfront plan is like an interface to that workflow.

SPEAKER_01

Let's talk about the Pro tier briefly. OpenAI also released GPT-5.4 Pro for maximum performance on complex tasks.

SPEAKER_00

Which probably means higher compute per answer. If you're doing mission-critical work, like generating a big financial model or a complicated code refactor, you might pay for the extra robustness.

SPEAKER_01

Now, outside perspective. Some coverage frames this as a competitive move. Open AI, responding to pressure from Anthropic and Google.

SPEAKER_00

And that's real. But I'd argue the bigger story is the product category shift from chatbots to agents.

SPEAKER_01

Yes. The release reads like an agent platform announcement. Long context, tool search, computer use, better web research.

SPEAKER_00

And the core user experience change, steerability, acknowledges something that's been true for a while. People don't want a black box answer. They want a controllable process.

SPEAKER_01

Okay, skepticism moment. What could go wrong with this mid-response steering concept?

SPEAKER_00

If users overtrust the plan, a plan can be plausible but flawed. Also, mid-response steering could give a false sense of control if the underlying reasoning still drifts.

SPEAKER_01

Anogenic computer use, UI automation can be fragile. One layout change, and your agent clicks the wrong thing.

SPEAKER_00

Exactly, which is why confirmation and verification steps matter. Agents need to read back what they're doing, check results, and fail safely.

SPEAKER_01

So if we summarize the release in one sentence, GPT 5.4 is open AI betting that the future is controllable. Tool using agents that can operate in real software environments, not just produce text.

SPEAKER_00

And they're adding three ingredients for that: better reasoning, better coding, and better operational capability. Computer use, tool search, long context.

SPEAKER_01

Last question, Laura. What's the most underrated part of this release?

SPEAKER_00

The subtle shift toward workflow design. The model giving an upfront plan is basically inviting humans to become managers of AI work streams. That's a new skill.

SPEAKER_01

And the most overrated part?

SPEAKER_00

Probably the raw context number. It's impressive, but real performance depends on retrieval and focus, not just how much you can stuff in.

SPEAKER_01

That's a good note to end on. Capability is not just scale, it's reliability and controllability.

SPEAKER_00

Exactly.

SPEAKER_01

That's all for today's episode of the DX Today podcast. Thanks for listening, and we'll see you next time.