Skip to main content
News Directory 3
  • Home
  • Business
  • Entertainment
  • Health
  • News
  • Sports
  • Tech
  • World
Menu
  • Home
  • Business
  • Entertainment
  • Health
  • News
  • Sports
  • Tech
  • World

Microsoft Azure ND GB300 Inference: 1.1 Million Tokens/Sec

November 4, 2025 Lisa Park - Tech Editor Tech

“`html

Azure GB300 Achieves Industry-First: Over 1 ‌Million ‌Tokens Per Second in AI Inference

Table of Contents

  • Azure GB300 Achieves Industry-First: Over 1 ‌Million ‌Tokens Per Second in AI Inference
    • Signal65 ‍Validation and Performance Gains
    • Understanding Tokens and AI⁣ Inference
    • Azure ND GB300: Key ‌Specifications

Published November 4, 2024, at 05:56 AM PST

Microsoft announced on November 2, 2024, that its new Azure ND GB300​ virtual machine has achieved a breakthrough‌ in artificial intelligence (AI) inference ​performance, surpassing one million tokens per ‌second.This marks a significant milestone, representing an industry first and demonstrating substantial improvements over previous generations of hardware. The performance was independently validated by ⁢Signal65,a performance-validation and ​benchmarking ​firm.

what: Microsoft’s Azure ND ‌GB300 virtual machine achieves over 1 million ⁢tokens per second in AI inference.
​
Where: Azure cloud platform.
When: Announced November 2,⁣ 2024.
⁢
Why it matters: Represents a major leap in AI inference speed and efficiency, enabling faster and more responsive AI applications.
What’s next: The Azure ND GB300 VMs⁣ are now available,⁢ offering enterprises ⁢enhanced AI capabilities.
⁣ ⁤

Signal65 ‍Validation and Performance Gains

According to a blog post by Signal65,the azure ND GB300 delivers a 27% betterment in inference performance compared to the previous NVIDIA GB200 generation,with only a 17% ​increase in power consumption. This efficiency gain is crucial for reducing operational ⁤costs and environmental impact.

Signal65 further reported that ‌the GB300 offers​ nearly⁣ a 10x increase in⁣ inference performance over the NVIDIA H100 generation, coupled with a nearly 2.5x improvement​ in ‌power efficiency when measured at‍ the rack level. This substantial improvement positions Azure as a leader in providing high-performance,energy-efficient⁣ AI infrastructure.

“This milestone⁣ is significant not just for breaking the one-million-token-per-second barrier and being‌ an industry-first, but for doing so⁤ on a‍ platform architected to meet the dynamic use ⁣and ​data governance ⁤needs of ‍modern enterprises,” said Russ Fellows, VP of Labs at Signal65.

Understanding Tokens and AI⁣ Inference

In the ⁣context of large⁤ language​ models (LLMs), a “token” is ‍a unit of text ‍- it can be a word, part ⁤of a word, or even a single character. The number of tokens processed per ⁤second (tokens/s) is ⁤a key⁤ metric for evaluating ⁣the speed of AI inference. Higher tokens/s rates⁣ translate to faster response times for AI applications, ‌such as chatbots, content generation tools, and⁢ code completion assistants.

AI inference is‌ the process ‌of using a ​trained AI model to make predictions or‌ generate outputs based on new input data. Efficient inference‌ is critical for deploying AI applications in real-world scenarios where ‍low latency and high⁤ throughput are essential.

Azure ND GB300: Key ‌Specifications

While ‌detailed specifications are still emerging, the Azure ND GB300 VMs are built around NVIDIA GB300 GPUs. These GPUs feature​ significant architectural improvements designed to accelerate AI workloads. Microsoft has⁢ optimized the Azure ⁢platform ‍to fully leverage the capabilities of the​ GB300, resulting in the observed performance gains.

Share this:

  • Share on Facebook (Opens in new window) Facebook
  • Share on X (Opens in new window) X

Related

AI (Artificial Intelligence), Microsoft

Search:

News Directory 3

ByoDirectory is a comprehensive directory of businesses and services across the United States. Find what you need, when you need it.

Quick Links

  • Copyright Notice
  • Disclaimer
  • Terms and Conditions

Browse by State

  • Alabama
  • Alaska
  • Arizona
  • Arkansas
  • California
  • Colorado

Connect With Us

© 2026 News Directory 3. All rights reserved.

Privacy Policy Terms of Service
Metric Azure ND GB300 NVIDIA GB200 NVIDIA⁣ H100
Inference Performance > 1 Million tokens/s ~740,000 Tokens/s (estimated) ~100,000‌ Tokens/s (estimated)
Performance Improvement (vs GB200) 27% – –
Performance improvement ‌(vs⁢ H100) ~10x