Products
Alluxio Advances Analytics and AI with NVIDIA Accelerated Computing
March 23, 2021
Alluxio Data Orchestration Platform Now Integrated with RAPIDS Accelerator for Spark
SAN MATEO, CA – March 23, 2021 - Alluxio, the developer of open source cloud data orchestration software, today announced the integration of RAPIDS Accelerator for Apache Spark 3.0 with the Alluxio Data Orchestration Platform to accelerate data access on NVIDIA accelerated computing clusters for computation of both analytics and Artificial Intelligence (AI) pipelines. Validation testing of the integration for caching of large datasets and data availability for NVIDIA GPU processing showed 2x faster acceleration for a data analytics and business intelligence workload. At the same time, NVIDIA GPU clusters with Alluxio demonstrated 70% better return on investment (ROI) compared to CPU clusters.
Data processing is increasingly making use of NVIDIA GPUs for massive parallelism. This is the case for both analytics pipelines and AI / Machine Learning (ML) pipelines.Benefits from GPU acceleration for an end-to-end pipeline are limited if data access dominates the execution time. GPU-based processing drives higher data access throughput than a CPU-based cluster. With the separation of processing clusters for analytics and AI from data storage systems, accelerating data access allows for cost savings on agile business intelligence and data science workloads.
“With the advances made from the unrivaled processing power of NVIDIA’s software and hardware, the bottleneck for users is now storage access throughout the data pipeline,” said Haoyuan Li, Founder and CEO, Alluxio. “From this integration, users now benefit from the separation of processing clusters for analytics and AI from data storage systems, accelerating data access within milliseconds to make critical decisions, find efficiencies, lower cost, and improve customer experience.”
“Accelerating data processing compute speeds means that data also needs to be accessed more quickly by data science and AI applications so that the entire pipeline works in harmony,” said Scott McClellan, Senior Director, Data Science Product Group, NVIDIA. “Alluxio’s integration of RAPIDS for Apache Spark, combined with the accelerated computing power of NVIDIA GPUs, means that Alluxio Data Orchestration customers will be able to boost the efficiency of their analytics and AI workloads without any code changes.”
Key highlights of the Alluxio with RAPIDS Accelerator for Apache Spark 3.0 integration, include:
- Data locality for I/O acceleration. Alluxio manages local storage resources on the GPU cluster and provides a high performance distributed cache to accelerate data access from a remote storage cluster.
- No code changes for ease of use. To use RAPIDS on GPU enabled clusters and Alluxio for storage access, no code changes are required. This makes adoption of the solution pain free for customers looking to migrate from their existing software stack.
- API flexibility. Multiple data access APIs are supported to enable the use of the most appropriate processing framework for each step of the data pipeline. The distributed cache is shared to allow for high performance even when data moves from one framework to another.
RAPIDS Accelerator for Apache Spark 3.0 with Alluxio Data Orchestration Platform integration is immediately available.
Resources
- To learn about the use of Alluxio to accelerate data access from NVIDIA GPUs for both analytics and AI with a discussion of benchmarking with Google Dataproc, visit this NVIDIA developer blog.
- To learn more about accelerating I/O for Apache Spark with RAPIDS, register for the GTC 2021 talk “Enabling data orchestration with RAPIDS Accelerator”.
- To get started with Alluxio and RAPIDS accelerator for Apache Spark 3.0, visit this documentation.
- To access the RAPIDS Accelerator for Apache Spark 3.0 and the getting started guide, visit the GitHub repo.
- For more information about I/O acceleration for GPU based deep learning using Alluxio in Kubernetes, read this developer blog.
Tweet this: @Alluxio integrates @RAPIDSai to accelerate #analytics and #AI pipelines on @NVIDIAAI #GPU clusters https://bit.ly/3lEgtt0
About Alluxio
Alluxio is a leading provider of accelerated data access platforms for AI workloads. Alluxio’s distributed caching layer accelerates AI and data-intensive workloads by enabling high-speed data access across diverse storage systems. By creating a global namespace, Alluxio unifies data from multiple sources—on-premises and in the cloud—into a single, logical view, eliminating the need for data duplication or complex data movement.
Designed for scalability and performance, Alluxio brings data closer to compute frameworks like TensorFlow, PyTorch, and Spark, significantly reducing I/O bottlenecks and latency. Its intelligent caching, data locality optimization, and seamless integration with modern data platforms make it a powerful solution for teams building and scaling AI pipelines across hybrid and multi-cloud environments. Backed by leading investors, Alluxio powers technology, internet, financial services, and telecom companies, including 9 out of the top 10 internet companies globally. To learn more, visit www.alluxio.io.
Media Contact:
Beth Winkowski
Winkowski Public Relations, LLC for Alluxio
978-649-7189
beth@alluxio.com
.png)
News & Press
Announcing the 2025 Intellyx Digital Innovator Award Winners
AMSTERDAM, NETHERLANDS, JUNE 10, 2025 — In today’s confusing and messy enterprise software market, innovative technology solutions that realize real customer results are hard to come by. As an industry analyst firm that focuses on enterprise digital transformation and the disruptive vendors that support it, Intellyx interacts with numerous innovators in the enterprise IT marketplace.
Storage news round-up – May 29
Alluxio, supplier of open source virtual distributed file systems, announced Alluxio Enterprise AI 3.6. This delivers capabilities for model distribution, model training checkpoint writing optimization, and enhanced multi-tenancy support. It can, we’re told, accelerate AI model deployment cycles, reduce training time, and ensure data access across cloud environments. The new release uses Alluxio Distributed Cache to accelerate model distribution workloads; by placing the cache in each region, model files need only be copied from the Model Repository to the Alluxio Distributed Cache once per region rather than once per server.