What’s Alluxio?

Open source data orchestration for analytics and machine learning in any cloud


Inconsistent performance on S3

S3 performance for analytic workloads is inconsistent and data egress is expensive.

Cloud caching solution >

Limited compute capacity on-prem

Making HDFS or object store data accessible to any compute in any cloud is complex.

Zero-copy burst solution >

Slow on-prem object store

Object storage performance, particularly for metadata operations, is unpredictable.

Faster workloads on object store solution >

Alluxio can help


Accelerate analytics with a
multi-tiered distributed caching layer close to compute


Access data in many ways with different APIs including HDFS, S3, POSIX, Java and REST


Scale data elastically on demand with a global namespace across many storage system


Featured use cases at DBS Bank include “zero-copy” bursting for on-prem compute and object store analytics acceleration.

Reference Architecture at DBS Bank

Meet Alluxio

Alluxio enables data orchestration for compute in any cloud. It unifies data silos on-premise and across any cloud to give you the data locality, accessibility, and elasticity needed to reduce the complexities associated with orchestrating data for today’s big data and AI/ML workloads.

Scalable to over a billion files in a single cluster, Alluxio’s distributed architecture is built on three core components:

  • Alluxio Master, which manages file and object metadata
  • Alluxio Worker, which manages the node’s local space, as well as manages file and object blocks and interfaces with the storage systems underneath
  • Alluxio Client, which allows analytics and AI/ML applications to interface with Alluxio


Alluxio helped Ryte decoupled S3 latency spikes from user requests without the need for additional hardware.

Learn more >

With Presto + Alluxio on AWS EMR, Ryte saw an average of 4x improvement in performance of Presto queries.

Key Technical Features


Support for hyperscale workloads
Supports a billion files and thousands of workers and clients, all with high-availability.

Flexible APIs
Integrates your compute frameworks like Spark, Presto, Tensorflow, Hive and more out-of-the-box using the HDFS, S3, Java, RESTful, or POSIX-based APIs.

Intelligent data caching and tiering
Automatically utilizes near-compute storage media for optimal data placement based on data topology and workload.


Built-in data policies
Provides highly customizable data policies for persistence, cross storage data migration, and distributed load.

Plug and play under stores
Integrates your under store systems like HDFS, S3, Azure Blob Store, Google Cloud Store and more through a range of interfaces.

Transparent unified namespace for file system and object stores
Mounts multiple storage systems into a single consolidated namespace for both read and write workloads.


Provides data protection on the wire and in the cloud with built-in auditing, role-based access control, LDAP, active directory, and encrypted communications.

Monitoring and management
Provides a user-friendly web interface and command line tools, allowing users to monitor and manage their cluster.

Enterprise high availability with tiered locality
Includes adaptive replication across regions and zones to maximize performance and availability.

Get Started with Alluxio

Alluxio offers a free Community Edition and an Enterprise Edition