OpenAI GPT-5.3-Codex: New AI Model Outpaces Anthropic’s Claude 4.6 in Coding Wars

News Context

At a glance

OpenAI on Wednesday released GPT-5.3-Codex, its most capable coding agent to date.
The simultaneous announcements come during an already heated week for both AI giants, who are also set to air competing Super Bowl advertisements on Sunday.
it feels like more of a step forward than the benchmarks suggest,” OpenAI CEO Sam Altman wrote on X shortly after the launch.

OpenAI on Wednesday released GPT-5.3-Codex, its most capable coding agent to date. The launch was strategically timed to coincide with Anthropic’s unveiling of Claude Opus 4.6, marking what industry observers are calling the opening salvo in a high-stakes competition to capture the enterprise software development market.

The simultaneous announcements come during an already heated week for both AI giants, who are also set to air competing Super Bowl advertisements on Sunday. Executives from both companies have publicly exchanged pointed remarks regarding business models, access, and ethical considerations.

“I love building with this model. it feels like more of a step forward than the benchmarks suggest,” OpenAI CEO Sam Altman wrote on X shortly after the launch. He further noted, “It was amazing to watch how much faster we were able to ship 5.3-Codex by using 5.3-Codex, and for sure This represents a sign of things to come.” This claim – that the model assisted in its own creation – represents a significant milestone in AI development.

According to OpenAI, the Codex team leveraged early versions of GPT-5.3-Codex to debug its own training runs, manage deployment infrastructure, and diagnose test results and evaluations, describing it as “our first model that was instrumental in creating itself.”

Record-Breaking Benchmarks and Performance Gains

GPT-5.3-Codex demonstrates substantial gains across multiple industry benchmarks. The model achieved a score of 57% on SWE-Bench Pro, a rigorous evaluation of real-world software engineering challenges spanning four programming languages. It also scored 77.3% on Terminal-Bench 2.0, which assesses the terminal skills essential for coding agents, and 64% on OSWorld, an agentic computer-use benchmark evaluating performance on productivity tasks within visual desktop environments.

The Terminal-Bench 2.0 result is particularly noteworthy. GPT-5.3-Codex’s 77.3% score represents a 13-percentage-point increase over GPT-5.2-Codex (64.0%) and the base GPT-5.2 model (62.2%). One user on X observed that this performance “absolutely demolished” Anthropic’s Opus 4.6, which reportedly achieved 65.4% on the same benchmark.

OpenAI also claims the new model operates with significantly improved efficiency, requiring less than half the tokens of its predecessor for equivalent tasks and achieving more than 25% faster inference per token. “Notably, GPT-5.3-Codex does so with fewer tokens than any prior model, letting users simply build more,” the company stated.

Beyond Coding: Expanding the Scope of AI Agents

Perhaps more significant than the benchmark improvements is OpenAI’s positioning of GPT-5.3-Codex as a model that transcends pure coding. The company explicitly states that “Codex goes from an agent that can write and review code to an agent that can do nearly anything developers and professionals can do on a computer.”

This expanded capability set includes debugging, deployment, monitoring, writing product requirement documents, editing copy, conducting user research, building slide decks, and analyzing data in spreadsheet applications. The model demonstrates strong performance on GDPVal, an OpenAI evaluation released in 2025 that measures performance on well-specified knowledge-work tasks across 44 occupations.

This expansion signals OpenAI’s ambition to capture not just the developer tools market, but the broader enterprise productivity software space – a market currently dominated by established players like Microsoft, Salesforce, and ServiceNow, all of whom are actively integrating AI agents into their platforms.

Cybersecurity Focus and New Safety Protocols

The pivot toward general-purpose computing brings new security considerations. OpenAI disclosed that GPT-5.3-Codex is the first model it classifies as “High capability” for cybersecurity-related tasks under its Preparedness Framework, and the first directly trained to identify software vulnerabilities.

“While we don’t have definitive evidence it can automate cyber attacks end-to-end, we’re taking a precautionary approach and deploying our most comprehensive cybersecurity safety stack to date,” the company stated. Mitigations include dual-use safety training, automated monitoring, trusted access for advanced capabilities, and enforcement pipelines incorporating threat intelligence.

Altman highlighted this development on X, stating: “This is our first model that hits ‘high’ for cybersecurity on our preparedness framework. We are piloting a Trusted Access framework, and committing $10 million in API credits to accelerate cyber defense.”

OpenAI is also expanding the private beta of Aardvark, its security research agent, and partnering with open-source maintainers to provide free codebase scanning for widely used projects. OpenAI cited Next.js as an example where a security researcher used Codex to discover vulnerabilities disclosed last week.

A Public Rivalry and the Enterprise AI Market

The cybersecurity announcement has been somewhat overshadowed by the increasingly public nature of the OpenAI-Anthropic rivalry. The timing of Wednesday’s release is inextricably linked to Anthropic’s unveiling of Claude Opus 4.6, and the companies are also set to air competing Super Bowl advertisements on Sunday.

Anthropic unveiled Claude Opus 4.6, which it describes as its “smartest model” capable of more careful planning, sustained agentic tasks, reliable operation in large codebases, and self-correction.

Altman responded to Anthropic’s advertising campaign, calling it “funny” but “clearly dishonest” in an extensive post on X. He characterized Anthropic as an “authoritarian company” that “wants to control what people do with AI,” and asserted that OpenAI serves a broader user base.

The Financial Stakes of the AI Coding Race

The public sparring masks a serious business competition. Both companies are vying for position in a rapidly expanding enterprise AI market. According to survey data from Andreessen Horowitz released this week, enterprise spending on large language models has dramatically outpaced projections. Average enterprise LLM spending reached $7 million in 2025, 180% higher than 2024’s actual spending of $2.5 million – and 56% above projections from just a year prior. Spending is projected to reach $11.6 million per enterprise in 2026.

OpenAI maintains the largest average share of enterprise AI wallet, but that share is shrinking – from 62% in 2024 to a projected 53% in 2026. Anthropic’s share, meanwhile, is growing from 14% to a projected 18% over the same period, with Google showing similar gains.

GPT-5.3-Codex is available immediately for paid ChatGPT users across all Codex surfaces: the desktop app, command-line interface, IDE extensions, and web interface. API access is expected to follow. OpenAI promises more capabilities in the coming weeks, with Altman declaring: “I believe Codex is going to win.”