Alluxio Helps AI Teams Get More from Every GPU

June 3, 2026

SAN MATEO, Calif. — JUNE 3, 2026 — Alluxio, the developer of a leading large-scale caching solution for AI, today announced a solution designed to help organizations maximize GPU utilization and improve the efficiency of AI workloads on Oracle Cloud Infrastructure (OCI). By combining Alluxio’s data acceleration capabilities with OCI’s high-performance AI infrastructure, organizations can reduce data bottlenecks and keep GPUs continuously fed with data for training and inference.

As organizations increasingly rely on object storage as the foundation for AI, they often face tradeoffs between maintaining data in place and achieving high-performance access. Traditional approaches can require moving large datasets to align with compute resources, increasing operational complexity and cost. Alluxio helps address these challenges by enabling high-throughput, low-latency data access without requiring data migration, allowing organizations to run AI workloads more efficiently.

Alluxio can be deployed alongside GPU environments on OCI, aggregating local NVMe storage into a distributed caching layer that delivers data access at sub-millisecond latency while delivering terabytes per second of aggregate throughput. This approach enables AI workloads to efficiently access data while maintaining flexibility across storage environments.

Organizations using Alluxio capabilities on OCI can benefit from:

Improved GPU Utilization: Helps reduce data access bottlenecks and enable GPUs to sustain utilization levels above 90 percent
Enhanced Cost Efficiency: Helps keep GPUs more consistently utilized, improving overall resource efficiency
High-Performance Data Access: Provides sub-millisecond latency, high-throughput access to data through a distributed caching layer
Zero Data Migration: Enables access to data stored in OCI Object Storage or S3-compatible environments without copying or reformatting data
Seamless Integration: Supports standard interfaces such as POSIX and S3, allowing existing AI pipelines to run with minimal modification

By reducing the need for manual data movement and complex replication strategies, the solution helps simplify operations for organizations running AI workloads at scale.

Fireworks AI Demonstrates Large-Scale AI Performance

Fireworks AI, an inference cloud platform delivering more than 10 trillion tokens per day, uses Alluxio to support high performance data access across distributed GPU environments, including OCI.

Operating GPU infrastructure across heterogeneous environments, Fireworks requires extremely fast data distribution to keep large-scale inference clusters fully utilized. By deploying Alluxio as a distributed data layer alongside GPU clusters, Fireworks has built a high-performance infrastructure capable of delivering massive datasets to compute environments at unprecedented speed.

“To deliver fast, reliable inference at scale, we needed a more efficient way to manage data across our GPU infrastructure,” said Chenyu Zhao, cofounder at Fireworks AI. “With Alluxio, we’ve reduced data access times and improved overall system performance while maintaining flexibility across environments. Our infrastructure spans heterogeneous GPU environments, and we rely on efficient data access to maintain performance. By using Alluxio alongside GPU clusters—including those on OCI—we’ve built a distributed system capable of serving more than 2 PB of data daily, reducing replica download times for large models from 20 minutes to 2 minutes, and achieving up to 1 TB/s in aggregate throughput. This architecture allows us to maintain industry-leading inference performance without the operational burden of constantly moving data.”

Supporting Efficient AI Infrastructure on OCI

“The goal is simple: maximize the value of every GPU,” said Haoyuan Li, CEO at Alluxio. “OCI provides some of the best GPU price-performance in the industry. By pairing that infrastructure with Alluxio’s distributed data acceleration layer, AI teams can keep GPUs fully utilized and scale compute wherever innovation demands.”

“Oracle Cloud Infrastructure is designed to deliver the performance, scalability, and cost efficiency required for today’s most demanding AI workloads,” said Sachin Menon, Vice President of Cloud Engineering at Oracle Cloud Infrastructure. “By working with partners like Alluxio, we can help customers reduce bottlenecks and run AI training and workloads with more consistent performance.”

Learn more:

‍

‍About Fireworks AI

Fireworks AI is the global AI inference cloud and infrastructure platform that enables teams like Cursor, Uber, DoorDash, and Shopify to build, tune, and scale highly optimized generative AI applications. Fireworks provides deep support for hundreds of state-of-the-art open models in text, image, audio, embedding, and multi-modal formats globally. Visit https://fireworks.ai/ for more information.

About Alluxio

Alluxio is a leading provider of accelerated data access platforms for AI workloads. Alluxio’s distributed caching layer accelerates AI and data-intensive workloads by enabling high-speed data access across diverse storage systems. By creating a global namespace, Alluxio unifies data from multiple sources—on-premises and in the cloud—into a single, logical view, eliminating the need for data duplication or complex data movement.

Designed for scalability and performance, Alluxio brings data closer to compute frameworks like TensorFlow, PyTorch, and Spark, significantly reducing I/O bottlenecks and latency. Its intelligent caching, data locality optimization, and seamless integration with modern data platforms make it a powerful solution for teams building and scaling AI pipelines across hybrid and multi-cloud environments. Backed by leading investors, Alluxio powers technology, internet, financial services, and telecom companies, including 9 out of the top 10 internet companies globally. To learn more, visit www.alluxio.io.

Media Contact:
Amelia Wong
amelia@alluxio.com

About Alluxio

Media Contact:
Beth Winkowski
Winkowski Public Relations, LLC for Alluxio
978-649-7189
beth@alluxio.com

News & Press

Alluxio Closes Strong Q2 with Customer Growth, Sub-Millisecond Latency Capability for AI Data & Record MLPerf Storage v2.0 Benchmark Results

SAN MATEO, Calif., Aug. 27, 2025 -- Alluxio, the AI and data-acceleration platform, today announced strong results for the second quarter of its 2026 fiscal year. During the quarter, the company launched Alluxio Enterprise AI 3.7, a major release that delivers sub-millisecond TTFB (time to first byte) latency for AI workloads accessing data on cloud storage.

Announcing the 2025 Intellyx Digital Innovator Award Winners

AMSTERDAM, NETHERLANDS, JUNE 10, 2025 — In today’s confusing and messy enterprise software market, innovative technology solutions that realize real customer results are hard to come by. As an industry analyst firm that focuses on enterprise digital transformation and the disruptive vendors that support it, Intellyx interacts with numerous innovators in the enterprise IT marketplace.

Storage news round-up – May 29

Alluxio, supplier of open source virtual distributed file systems, announced Alluxio Enterprise AI 3.6. This delivers capabilities for model distribution, model training checkpoint writing optimization, and enhanced multi-tenancy support. It can, we’re told, accelerate AI model deployment cycles, reduce training time, and ensure data access across cloud environments. The new release uses Alluxio Distributed Cache to accelerate model distribution workloads; by placing the cache in each region, model files need only be copied from the Model Repository to the Alluxio Distributed Cache once per region rather than once per server.

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer

Request a demo