Better AI Depends on Improved Data Infrastructure
- As artificial intelligence continues to revolutionize various industries, the investment in AI infrastructure is surging.
- The real challenge, however, lies not just in the investment but in leveraging AI to its full potential.
- AI models and the data required to train them have grown exponentially.
Feature: Artificial Intelligence and the Quest for Data Management
As artificial intelligence continues to revolutionize various industries, the investment in AI infrastructure is surging. The first half of 2024 saw AI infrastructure investment hit $31.8 billion, according to International Data Corporation (IDC). By 2028, full-year spending is expected to exceed $100 billion as AI becomes pervasive in enterprises through greater use of discrete applications. Including AI-enabled applications and related IT and business services, total worldwide spending is forecast to reach $632 billion in 2028.
The real challenge, however, lies not just in the investment but in leveraging AI to its full potential. Enterprises seeking to optimize operations and enhance return on investment (ROI) must focus on data management throughout the AI pipeline. Traditional storage and data management solutions, whether on-premises or in the cloud, are already strained by the demands of AI.
Capacity is a significant issue. AI models and the data required to train them have grown exponentially. For instance, Google’s Bert model had 100 million parameters at its launch in 2018, while ChatGPT 4 was estimated to have over a trillion. At the other end of the pipeline, the need for real-time inference makes latency and throughput equally critical.
AI requires a multitude of data types and storage solutions, spanning structured, semi-structured, and unstructured data. This necessitates a range of underlying storage infrastructure—block, file, and object storage. Managing the complexity of capturing and securing this data across distributed sources poses a significant challenge. Enterprises must ensure they have visibility across their entire data estate and AI pipeline, with secure and intelligent data management.
When Legacy Means Lags
Table of Contents
- 1. What is the projected investment in AI infrastructure by 2028, and why is it significant?
- 2. Why is data management crucial for leveraging AI, and what challenges are associated with it?
- 3. How have AI model capabilities evolved, and what issues arise from these advancements?
- 4. How does the legacy data infrastructure impact the implementation of AI, and what trends are helping to mitigate these issues?
- 5. Why is metadata management vital for AI applications,and what challenges do organizations face in this area?
- 6. What are the key components of DataDirect Networks’ Data Intelligence Platform, and how do they address AI data management challenges?
- 7. How does the partnership between DataDirect Networks and Nvidia enhance AI capabilities?
The advent of newer and more specialized AI models doesn’t eliminate these fundamental issues. When the Chinese AI engine DeepSeek burst onto the market in the first half of 2025, it highlighted how significant investments in AI infrastructure are in constantly evolving. Organizations often face difficulties accessing and deriving value from the volumes of data required for AI operations.
Even so, the current trends towards cheaper compute power and more accessible AI infrastructure are making it feasible to master and deploy significant AI capabilities within enterprises.
“If the computational part gets cheaper, it means more people participate, and many more models are trained. With more people and more models, the challenge of preparing and deploying data to support this surge becomes even more critical.”
Sven Oehme, chief technology officer at DataDirect Networks
Oehme highlights that the issue extends beyond raw performance and capacity. Managing data intelligently and securely is paramount. Metadata, for example, can significantly reduce the amount of data that needs to be analyzed by narrowing down the relevant data. In sectors like autonomous vehicles, where the ability to analyze metadata such as time of day, speed, and direction is crucial, managing this efficiently is a game-changer.
Adding to the challenge is the highly distributed nature of data sources, which can create a bureaucratic headache for data management. organizations often juggle multiple databases and event systems, which can be expensive, complex, and time-consuming. This creates latency issues, which are especially critical in fields like autonomous driving and healthcare. Even tech giants like AWS have had to develop separate products, such as S3 Metadata, to address this complexity.
In practical terms, an autonomous or connected vehicle is constantly gathering images, metadata, and other information critical to AI applications. Managing this efficiently requires a sophisticated data infrastructure that can handle the rich data involved and provide full governance from data creation to consumption.
Data Needs Intelligence, Too
Data management solutions must deliver more than just hardware performance. They need to manage data securely, at scale, and be accessible both in the cloud and on-premises. Experts indicate that platform solutions such as these need to offer multi-tenancy, catering to various environments from enterprise applications to hyperscalers.
DataDirect Networks (DDN) offers a solution with its Data Intelligence Platform. This platform consists of two key components. Infinia 2.0
is a software-defined storage platform providing a unified view of an organization’s data collections. The EXAScaler
, its highly scalable file system, is optimized for high-performance, big data, and AI workloads.
Oehme explains, “Infinia is a data platform that also happens to speak many storage protocols, including those for structured data. It allows you to store data, but not just normal data files and objects. It allows me to store a huge amount of metadata combined with unstructured data in the same view.”
This approach eliminates the need for multiple silos and the complexities of managing multiple data analysis and management tools, resulting in more efficient data pipelines and operations. The Infinia 2.0 and EXAScaler platforms are designed to scale from enterprise applications to cloud service providers and hyperscalers. They support up to 100 PB in a single rack, delivering a 75 percent reduction in power, cooling, and data center footprint, with 99.999 percent uptime. The impacts of high density and efficiency are particularly relevant to the future of AI, where access to power and real-estate are rapidly becoming major constraints.
DDN has a strong partnership with semiconductor giant, which it is closely tailored to. This partnership ensures seamless integration with Nvidia’s hardware and software stacks, supporting over 100,000 GPUs in a single deployment. For hyperscalers and cloud service providers, enhancements in performance and efficiency are paramount, especially in high-stakes industries like autonomous driving, where every millisecond counts. Examples, like the elevated introduction of autonomous capabilities in U.S. manufacturing lines, further underscores the significance of achieving reliable, real-time data access.
is poised to play best in the AI market as equipping companies in operationalizing their knowledge of AI and data capabilities. Effective AI systems hinge on several factors, not least ensuring appropiate governance standards and infastructures.
