Home » Tech » Ring-1T: Ant Engineers Solve Reinforcement Learning Bottlenecks

Ring-1T: Ant Engineers Solve Reinforcement Learning Bottlenecks

by Lisa Park - Tech Editor

“`html





<a href="https://www.newsdirectory3.com/tremendous-junior-celebrates-18-years-of-success-kyuhyun-turns-into-taiwan-tourism-ambassador-amid-followers-heartwarming-cheers/" title="Tremendous Junior Celebrates 18 Years of Success: Kyuhyun Turns into Taiwan Tourism Ambassador Amid Followers' Heartwarming Cheers">Ant Group</a>‘s Ring-1T: A New Open-Source Reasoning⁢ Model


ant Group’s Ring-1T: A New Open-Source Reasoning Model Challenges AI ⁣Leaders

Overview

China’s Ant Group,an affiliate of Alibaba,has detailed ⁤technical details around its new model,Ring-1T, which the company said is⁤ “the first open-source reasoning model ⁢with ⁢one trillion total parameters.”

Ring-1T aims to compete with other reasoning models like ‍GPT-5​ and the o-series from OpenAI ‌as‍ well as Google‘s ​gemini‍ 2.5. with‍ the new release ‍of⁢ the latest model, Ant extends the ⁤geopolitical debate over‌ who will dominate the AI race: China or the US.

Ant Group said Ring-1T is optimized for mathematical and logical problems,code generation​ and scientific problem-solving.

“With approximately 50⁣ billion activated parameters per token, Ring-1T achieves state-of-the-art performance across multiple challenging benchmarks‌ – despite ⁢relying solely‌ on natural language reasoning capabilities,” Ant said in ⁣ a paper.

Ring-1T, which was first ⁣released on⁣ preview in September, adopts the same architecture as Ling 2.0 ​and ⁣trained on the ‍Ling-1T-base model the company released earlier this month. ‌Ant ‍said this allows ​the ⁢model to support up to 128,000 tokens.

New Methods of Training

To train a model as large⁣ as Ring-1T, researchers had to develop new⁣ methods to scale reinforcement learning (RL).

ant Group developed three “interconnected innovations”⁤ to support the RL and training of Ring-1T, a challenge given the model’s size ⁤and‌ the typically ⁢large compute requirements it entails. These three are IcePop, C3PO++ and ASystem.

IcePop ⁢removes noisy gradient updates to stabilize training without slowing inference. It helps eliminate catastrophic training-inference misalignment in RL. The researchers noted that when training

You may also like

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.