Alibaba’s Qwen team has released Qwen3-Coder-Next, an 80-billion-parameter language model specifically designed for coding agents. The release, announced on , marks a significant escalation in the competitive landscape of AI coding assistants, following recent launches from Anthropic, OpenAI, and the adoption of open-source frameworks like OpenClaw.
Qwen3-Coder-Next distinguishes itself through an ultra-sparse Mixture-of-Experts (MoE) architecture, activating only 3 billion parameters per forward pass. This design aims to deliver high performance with a lightweight footprint, making it cost-effective for deployment. Alibaba intends this to set a “new standard for open-weight intelligence,” according to reports.
Addressing the Long-Context Challenge
A core innovation behind Qwen3-Coder-Next is its ability to handle exceptionally long codebases. The model supports a context window of 262,144 tokens, and is designed to potentially extrapolate up to a million tokens. Here’s achieved through a hybrid architecture combining Gated DeltaNet with Gated Attention, addressing the quadratic scaling issues that typically plague traditional Transformer models when processing large amounts of data.
Traditional Transformers face a “memory wall” where processing cost increases exponentially with sequence length. Gated DeltaNet offers a linear-complexity alternative to standard softmax attention, allowing the model to maintain state across its extensive context window without significant latency penalties. Combined with the sparse MoE architecture, this results in a theoretical 10x throughput increase for repository-level tasks compared to dense models of similar capacity.
To mitigate context hallucination during training, the Qwen team employed Best-Fit Packing (BFP), a strategy designed to maintain efficiency and avoid errors associated with traditional document concatenation.
Agentic Training and Specialized Experts
Unlike previous coding models trained on static code-text pairs, Qwen3-Coder-Next was developed through a large-scale “agentic training” pipeline. This involved generating 800,000 verifiable coding tasks based on real-world bug-fixing scenarios sourced from GitHub pull requests, paired with fully executable environments. The training infrastructure, MegaFlow, leverages Alibaba Cloud Kubernetes to orchestrate a three-stage workflow: agent rollout, evaluation, and post-processing.
During rollout, the model interacts with a live, containerized environment. If the generated code fails unit tests or causes crashes, it receives immediate feedback through reinforcement learning, enabling it to learn from its mistakes and refine solutions in real-time.
The model supports 370 programming languages, a significant expansion from the 92 supported in previous versions. It also introduces a new XML-style tool calling format, designed to handle string-heavy arguments and long code snippets more efficiently than traditional JSON-based methods.
Qwen3-Coder-Next also utilizes specialized “Expert Models” for Web Development and User Experience (UX). The Web Development Expert focuses on full-stack tasks, rendering code samples in a Playwright-controlled Chromium environment with a Vite server for dependency management. A Vision-Language Model (VLM) then assesses the rendered pages for layout and UI quality. The UX Expert was optimized for tool-call format adherence, improving robustness across diverse CLI/IDE scaffolds.
The capabilities of these specialized experts are distilled back into the core 80B/3B MoE model, ensuring the lightweight deployment version retains nuanced knowledge.
Performance and Security Benchmarks
Evaluations using the SWE-Agent scaffold demonstrate Qwen3-Coder-Next’s efficiency relative to its active parameter count. On SWE-Bench Verified, the model achieved a score of 70.6%, outperforming DeepSeek-V3.2 (70.2%) and trailing only slightly behind GLM-4.7 (74.2%).
The model also exhibits robust security awareness. On SecCodeBench, which evaluates vulnerability repair capabilities, Qwen3-Coder-Next outperformed Claude-Opus-4.5 in code generation scenarios (61.2% vs. 52.5%). It maintained high scores even without security hints, suggesting it has learned to anticipate common security pitfalls during training. In multilingual security evaluations, it also achieved a func-sec@1 score of 56.32% on the CWEval benchmark, surpassing both DeepSeek-V3.2 and GLM-4.7.
Qwen3-Coder-Next is available on Hugging Face and GitHub under the permissive Apache 2.0 license, enabling both commercial and non-commercial use. The model weights are available in four variants, accompanied by a technical report detailing the training approach, and innovations.
