Products
Blog

Make Multi-GPU Cloud AI a Reality
If you’re building large-scale AI, you’re already multi-cloud by choice (to avoid lock-in) or by necessity (to access scarce GPU capacity). Teams frequently chase capacity bursts, “we need 1,000 GPUs for eight weeks,” across whichever regions or providers can deliver. What slows you down isn’t GPUs, it’s data. Simply accessing the data needed to train, deploy, and serve AI models at the speed and scale required – wherever AI workloads and GPUs are deployed – is in fact not simple at all. In this article, learn how Alluxio brings Simplicity, Speed, and Scale to Multi-GPU Cloud deployments.

Alluxio's Strong Q2: Sub-Millisecond AI Latency, 50%+ Customer Growth, and Industry-Leading MLPerf Results
Alluxio's strong Q2 featured Enterprise AI 3.7 launch with sub-millisecond latency (45× faster than S3 Standard), 50%+ customer growth including Salesforce and Geely, and MLPerf Storage v2.0 results showing 99%+ GPU utilization, positioning the company as a leader in maximizing AI infrastructure ROI.
.png)
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
.jpeg)
Effective Data Engineering in the Cloud World
Cloud has changed the dynamics of data engineering as well as the behavior of data engineers in many ways. This is primarily because a data engineer on premise only dealt with databases and some parts of the hadoop stack. In the cloud, things are a bit different. Data engineers suddenly need to think different and broader. Instead of being purely focused on data infrastructure, you are now almost a full stack engineer (leaving out the final end application perhaps). Compute, containers, storage, data movement, performance, network — skills are increasing needed across the broader stack. Here are some design concept and data stack elements to keep in mind.
Hybrid Multi-Cloud

Starburst Presto and Alluxio announce strategic OEM partnership
Announcing the OEM partnership with Alluxio and Starburst Data, the company behind Presto, the fastest growing SQL query engine in a disaggregated world.
Large Scale Analytics Acceleration
.jpeg)
Alluxio on EMR Fast Storage Access and Sharing for Spark Jobs
Traditionally, if you want to run a single Spark job on EMR, you might follow the steps: launching a cluster, running the job which reads data from storage layer like S3, performing transformations within RDD/Dataframe/Dataset, finally, sending the result back to S3. You end up having something like this. If we add more Spark jobs across multiple clusters, you could have something like this.
Large Scale Analytics Acceleration
.jpeg)
Building a cloud-native analytics MPP database with Alluxio
This article walks through the journey of a startup HashData in Beijing to build a cloud-native high-performance MPP shared-everything architecture leveraging object storage as the data persistence layer and Alluxio as a data orchestration layer in the cloud. we will illustrate how HDW leverages Alluxio as the data orchestration layer to eliminate the performance penalty introduced by object storage while benefiting from its scalability and cost-effectiveness.
Cloud Cost Savings
Large Scale Analytics Acceleration
.jpeg)
Speeding Big Data Analytics on the Cloud with InMemory Data Accelerator
Discontinuity in big data infrastructure drives storage disaggregation, especially in companies experiencing dramatic data growth after pivoting to AI and analytics. This data growth challenge makes disaggregating storage from compute attractive because the company can scale their storage capacity to match their data growth, independent of compute. This decoupled mode allows the separation of compute and storage, enabling users to rightsize hardware for each layer. Users can buy high-end CPU and memory configurations for the compute nodes, and storage nodes can be optimized for capacity. This whitepaper is a continuation of Unlock Big Data Analytics Efficiency with Compute and Storage Disaggregation on Intel® Platforms
Cloud Cost Savings
Large Scale Analytics Acceleration
.jpeg)
Distributed Data Querying with Alluxio
This is a guest blog by Jowanza Joseph with an original blog source. It is about how he used Alluxio to reduce p99 and p50 query latencies and optimized the overall platform costs for a distributed querying application. Jowanza walks through the product and architecture decisions that lead to our final architecture, discuss the tradeoffs, share some statistics on the improvements, and discuss future improvements to the system.
No items found.
.jpeg)
Scalable Metadata Service in Alluxio: Storing Billions of Files
Alluxio provides a unified namespace where you can mount multiple different storage systems and access them through the same API. To serve the file system requests to operate on all the files and directories in this namespace, Alluxio masters must handle the file system metadata at a scale of all mounted systems combined. We are writing several engineering blogs describing the design and implementation of Alluxio master to address this scalability challenge. This is the first article focusing on metadata storage and service, particularly how to use RocksDB as an embedded persistent key-value store to encode and store the file system inode tree with high performance.
No items found.

Data Orchestration: The Missing Piece in the Data World
At Alluxio, we believe that in order to fundamentally solve the data access challenges, the world needs a new layer - a data orchestration platform - between computation frameworks and storage systems.
Hybrid Multi-Cloud
.jpeg)
Welcome to Alluxio.io
Notice anything new about our websites? That’s right - we are super excited to launch our new website - Alluxio.io! As we continue our focus on our open source community, one important item on our mind was to rebuild our website to provide better user experience for our community. To that end, you’ll see lots of changes in the Alluxio web experience.
No items found.
.jpeg)
Recap Spark+AI Summit 2019
Alluxio is a proud sponsor and exhibitor of Spark+AI Summit in San Francisco. What’s Spark+AI Summit? It’s the world’s largest conference that is focused on Apache Spark - Alluxio’s older cousin open source project from the same lab (UC Berkeley’s AMPLab - now RISElab).
Hybrid Multi-Cloud
Large Scale Analytics Acceleration
.jpeg)
Two Ways to Keep Files in Sync Between Alluxio and HDFS
Alluxio provides a distributed data access layer for applications like Spark or Presto to access different underlying file system (or UFS) through a single API in a unified file system namespace. If users only interact with the files in the UFS through Alluxio, since Alluxio has knowledge of any changes the client makes to the UFS, it will keep Alluxio namespace in sync with the UFS namespace.
No items found.
.jpeg)
Moving From Apache Thrift to gRPC: A Perspective From Alluxio
As part of the Alluxio 2.0 release, we have moved our RPC framework from Apache Thrift to gRPC. In this article, we will talk about the reasons behind this change as well as some lessons we learned along the way. In Alluxio 1.x, the RPC communication between clients and servers is built mostly on top of Apache Thrift. Thrift enabled us to define Alluxio service interface in simple IDL files and implement client binding using native Java interfaces generated by Thrift compiler. However, we faced several challenges as we continued developing new features and improvements for Alluxio.
No items found.
Your selections don't match any items.
