Alluxio Architecture: A Decentralized Data Acceleration Layer for the AI Era

Author(s):

Bin Fan

VP of Technology

Alluxio

Haoyuan Li

Founder & CEO

Alluxio

Download Full White Paper Download the PDF

X

Complete the form below to access the full whitepaper:

In this whitepaper, you will learn how Alluxio's Decentralized Object Repository Architecture (DORA) drives sub-ms latency and terabytes per second of throughput to power large-scale AI training, deployment, and inference workloads in single cloud and multi-cloud deployments.

1. Abstract

This white paper presents the Alluxio architecture, a cloud-native Data Acceleration Layer built to bridge the gap between high-performance GPU computing and distributed cloud storage. Alluxio addresses the critical I/O and data-mobility challenges faced by modern AI infrastructure, where compute performance has far surpassed data access capabilities.

The Decentralized Object Repository Architecture (DORA) design eliminates centralized bottlenecks through fully decentralized metadata and caching, enabling sub-millisecond latency, TB/s throughput, and 97–98% GPU utilization across large-scale, multi-cloud environments. Performance results on GPU cloud environments demonstrate that Alluxio achieves 10 GiB/s per server (100Gbps NIC), <1 ms latency, and delivers FSx-level performance at one-third the cost.

“Bring Data Close to Compute — Anywhere, Any Cloud.”

2. Motivation and Problem Statement

Modern AI infrastructure faces data challenges as compute (GPUs) performance rapidly outpaces data access capabilities.

Data is not plug-and-play for GPUs. Data Gravity makes it difficult to achieve a plug-and-play experience for AI researchers and engineers. Workloads are often deployed wherever GPU capacity is more available or less expensive, while datasets remain bound to specific storage regions. Migrating or moving data on demand is often required but time consuming, error prone, and expensive. [ Watch Customer Tech Talk ] AI researchers and scientists don’t want to deal with issues like “too large to fit local dist; unstable I/O; wrong URLs or permission issues”.
High I/O Requirement for GPUs. During training, a single GPU may require I/O speed of more than 1 GB/s to sustain computations; and for a large-scale GPU cluster (e.g., thousand of GPUs) the peak throughput demand can reach terabytes per second. Slow data delivery directly translates into wasted GPU cycles and millions of dollars in unused compute capacity.
Massive Number of Files in Datasets Training multimodal models may require billions of files and related metadata entries. Common in traditional distributed file systems, centralized metadata services likely become the first point of slowness or failure, and ultimately a scalability barrier.

These challenges call for an Easy, Fast, and Scalable data access solution that enables users to focus on training, deploying, and serving AI models without worrying about data configuration or migration.

In short, Alluxio aims to enable researchers and engineers to access data wherever their compute runs — simply mount existing cloud bucket(s) as local folders but with local NVMe performance and start accessing data immediately without any migration.

Why Existing Solutions Fall Short

There are many data solutions in the ecosystem, but none address all three dimensions of scalability, simplicity, and cloud mobility:

Single-Node CLI Tools (e.g., s3fs, gcsfs): Convenient for mounting object storage on a single node, but lack distribution, sharing, and concurrency across clusters.
HPC Storage Systems (e.g., Lustre, GPFS, VastData, Weka, etc): Deliver excellent performance, but remain heavyweight, and often expensive data silos. They do not solve data gravity or cross-cloud accessibility.
Managed Cloud Caching (e.g., AWS FSx for Lustre, GCP Anywhere Cache): Closer to Alluxio’s goal in performance and target user-experience, but require dedicated provisioning, are bound to a single cloud, and lack a lightweight software-only footprint.

Alluxio’s approach, on the other hand, is a software-defined, cloud-native data acceleration layer that complements existing object storage instead of replacing it. It brings performance, caching, and rich semantics on top of your existing buckets—anywhere.

Goals

Alluxio focuses on three dominant workload categories:

Large-Scale Model Training & Deployment

High-throughput, POSIX-based access for distributed AI workloads.
Ultra-Low-Latency Feature Store / Agentic Memory on Cloud

Sub-millisecond access to Parquet and PB-scale data lakes directly on cloud storage.
Multi-Cloud Data Sharing and Synchronization

Unified namespace and caching across regions and clouds.

Non-Goals

Alluxio is not intended to be a general-purpose file system; it does not aim to implement full traditional filesystem semantics, just the subset required for AI workloads in training and inferencing.

Alluxio does not provide native data durability; instead, it relies on underlying cloud object stores (e.g., S3, GCS, OCI Storage) for persistence.

3. Decentralized Architecture Overview

Alluxio Enterprise Edition adopts the Decentralized Object Repository Architecture or DORA. The key idea of DORA is to deliver the extreme scalability, high availability, and top performance required by modern AI workloads.

3.1 Centralized vs. Decentralized Metadata Service

When the Alluxio open-source project was launched in 2013, the system followed the classic HDFS/GFS-style master–worker model designed for big data workloads. In this model, the master maintains the global metadata directory including an inode tree and its journal (edit logs), while workers store cached data blocks in a distributed way. This architecture served analytics frameworks such as Spark, Presto, and Hive well, where workloads typically involved up to hundreds of millions of files and I/O patterns dominated by reads into sufficiently large files (tens to hundreds of megabytes).

As workloads shifted from large-scale batch analytics to AI training and multimodal data processing, the I/O access pattern changed dramatically. I/O for AI workloads are characterized by:

Billions of small files, often representing individual samples, embeddings, or feature shards.

Highly concurrent and bursty reads, driven by parallel GPUs or distributed data loaders.

Simplified access semantics, dominated by open–read–checkpoint cycles rather than complex renames, updates or appends.

These shifts in access patterns exposed the fundamental limitations of the centralized metadata service design:

Metadata Centralization:
A single master typically maintains the entire inode tree and journal. Once the namespace scales beyond hundreds of millions of files, the master becomes a bottleneck. Restarting or replaying billions of journal entries can take hours.
I/O Path Inefficiency:
Every metadata lookup goes through the master, adding network hops and round-trip latency. Under highl- parallel GPU training workloads, this centralized routing layer quickly throttles throughput.
Failover Complexity:
Even with HA mechanisms such as RAFT-based replicated journals, leader elections and log replays cause multi-minute downtime—unacceptable for latency-sensitive AI environments.

Meanwhile, the simplification of AI I/O semantics—read-heavy workloads with weak cross-file consistency—opened up an opportunity: the system no longer needs a centralized master to coordinate every operation.

This combination of increased parallelism and reduced coordination requirements made a fully decentralized architecture both necessary and feasible. These observations ultimately led to the design of DORA — the Decentralized Object Repository Architecture.

3.2 Embracing Full Decentralization in the AI Era

These lessons led to a fundamental redesign: Alluxio is transformed from a centralized, master-orchestrated system into a fully decentralized, stateless caching layer for both data and metadata.

In Alluxio Enterprise Edition 3.0 or later, no master node exists. Instead, all workers participate in a consistent hashing ring. Each worker acts as both a data and metadata cache for files within its assigned segments on the ring. Instead of contacting a master to obtain file metadata and then fetching data from a worker, the client connects directly to the appropriate worker using consistent hashing based on the file path. Cluster coordination and membership management are handled by lightweight service registries (e.g., etcd) rather than a master node. This architecture is illustrated as below:

Major components are listed:
‍

Client

Embedded within applications or frameworks. Uses a consistent hashing algorithm to locate the responsible worker directly based on the file path, eliminating centralized routing or lookup tables.

Worker

The fundamental storage and caching unit. Each worker stores both data and metadata locally on NVMe, supports fine-grained page caching, and manages persistence using RocksDB. Workers operate independently and can join or leave dynamically.

Service Registry

Maintains cluster membership and service discovery. The default implementation uses etcd, which tracks active workers and their assigned data shards. It is not on the I/O path, ensuring no overhead on the critical path.

Coordinator

Manages background distributed tasks—such as prefetching, asynchronous loading, and replication—through a lightweight, stateless scheduling service. Provides observability and extensibility for custom job scheduling.

‍

Together, these components form a decentralized, self-balancing architecture that eliminates single points of failure while scaling linearly with data volume and compute growth.

3.3 I/O Flow

Figure 1 illustrates the I/O flow in a typical I/O request to Alluxio.

‍Application–Worker RPC
When the application (e.g., training job on a GPU server) makes a file operation (e.g., open or read), the client consults the consistent hashing ring to locate the responsible worker. The hash function ensures stable mappings even as workers join or leave the cluster.
‍
‍Local Cache Access
‍The selected worker first checks its local NVMe cache (called the Page Store) for the requested data block or page. Metadata for the same object is co-located on the same worker in RocksDB, ensuring constant-time lookups and zero dependency on external metadata services.
‍
‍Cache Miss and Data Fetch
‍On a cache miss, the worker fetches data directly from the underlying object store (e.g., S3, OCI Object Storage, or OSS) using range GET requests. The retrieved pages are cached asynchronously in the background for subsequent access, minimizing stall time for clients.

Through this flow, data is always fetched from the worker on the ring (local cache on cache hit and remote storage on miss).

3.4 Design Principles Summary

The DORA architecture embodies three key design principles:

Scalability – Metadata and data are fully decentralized by consistent hashing, removing global locks and central bottlenecks. Each worker can independently manage tens of millions of files, allowing near-linear horizontal scaling.
High Availability – There is no SPOF (single point of failure) because there are no components like a single master or journal. Cluster coordination via etcd ensures continuous service even under node failures.
Top Performance – Direct client-to-worker access and hash-based sharding enable sub-millisecond latency and TB/s throughput across distributed GPU clusters.

4. Worker: The Cache Engine

‍Each Alluxio Worker is the basic storage and caching engine in the DORA architecture. Each worker maintains a unique Worker ID, persisted locally on disk. Upon startup or restart, the worker re-registers with the service registry (etcd) using this ID to reclaim its corresponding position on the consistent hashing ring. Based on its assigned segment, the worker becomes responsible for both the metadata and cached data objects belonging to that segment.

Each worker manages a portion of local storage capacity, ideally on NVMe SSDs for high throughput and low latency. For files in its hash shards on the ring, data and metadata are both persisted locally, ensuring that a node restart does not result in cache loss.

Fine-Granularity Caching

Alluxio Workers use fine-grained caching (as opposed to entire-object caching). Each cached object is split into pages, typically ≤ 4 MB in size.

Caching at smaller granularity reduces read amplification but increases management overhead, while larger pages may waste space for small or partial reads with strong spatial locality (often seen when reading files such as Parquet, where footers and indexes are much more frequently accessed than the rest of the file).

Empirically, a 4 MB page size provides an optimal balance between cache efficiency and management overhead.

Cache Eviction

The page is also the smallest eviction unit. Each worker maintains LRU (Least Recently Used) queues to manage the page lifecycle. When local capacity is full, cold pages are evicted in LRU order to make space for new data.

Per-File Metadata Cache

In addition to data pages, each worker caches file-level metadata such as size, modification time, and permissions. As a result, metadata operations like fstat are served directly from the worker-side cache, avoiding RPCs to any central service.

Zero-Copy Data Transfer

The control path between clients and workers uses gRPC for metadata and coordination operations but not for the data path.

For data reads and writes, DORA employs a Netty-based zero-copy I/O pipeline, bypassing gRPC serialization and user-space buffering.

This reduces context switching and CPU overhead, improving throughput by 30–50% compared with traditional RPC-based data transfers.

In summary, each worker is a self-contained caching node that integrates:

Hash-based namespace sharding for scalability
Page-level caching and metadata colocation for efficiency
Persistent local storage for durability across restarts
Zero-copy networking for top-tier performance

5. Under File System (UFS): The Persistent Layer

Alluxio connects seamlessly with a wide range of existing storage systems—both cloud and on-premises—through its Under File System (UFS) abstraction. These connectors allow Alluxio to act as a data acceleration and unification layer on top of heterogeneous storage backends such as S3, GCS, HDFS, and NAS, without requiring any modification to existing data or infrastructure.

Each DORA worker interacts directly with its assigned UFS for data fetching and persistence. When data is first accessed, the worker retrieves it from the UFS and caches it locally on NVMe for subsequent high-speed access, while maintaining a consistent view of the namespace with the original source.

5.1 Supported Storage Systems

Alluxio provides native integrations for all major cloud object stores—including Amazon S3, Google Cloud Storage, Azure Blob Storage, Alibaba OSS, Tencent COS, and others—as well as on-premises systems such as HDFS, MinIO, Ceph, and NFS. Through the UFS layer, Alluxio exposes a unified file and object interface, automatically adapting to backend semantics such as eventual consistency, multipart upload, and authentication models. This design enables enterprises to combine multiple backends—such as linking multi-region S3 buckets and on-prem HDFS clusters—under a single Alluxio namespace.

5.2 UFS as the Source of Truth and Consistency Model

Within the DORA architecture, Alluxio serves as a stateless acceleration layer and the Under File System (UFS) serves as the persistent layer —the ultimate source of truth for all data. Alluxio’s cache provides high-speed, transient access, while the UFS guarantees durability and long-term consistency.

Alluxio maintains cache consistency through validation and synchronization optimized for performance-critical AI workloads:

Read-Through:
When a file is first accessed, the worker retrieves it directly from the UFS and caches both data and metadata locally. Subsequent reads are served from the cache unless the UFS version has changed.
Write and Sync Behavior:
Write operations follow a write-through or write-back (beta) policy depending on configuration:
- Write-through immediately persists changes to the UFS, ensuring full durability.
- Write-back batches and asynchronously flushes updates for higher performance, suitable for temporary or intermediate results.

In both cases, the UFS remains the single authoritative data store.

Configurable TTL and Refresh:
Users can define Time-to-Live (TTL) to control how long cached data is trusted before revalidation. Shorter TTL values ensure stronger consistency, while longer TTLs maximize performance for stable or read-only datasets such as model checkpoints and training data snapshots.

This model provides a flexible balance between correctness and performance, allowing Alluxio to safely accelerate read-heavy AI workloads—where datasets are immutable or rarely updated—while still preserving consistency for write-intensive applications.

In short, the UFS serves as the durable source of truth, and Alluxio’s cache layer maintains coherence through validation, TTL refresh, and policy-driven synchronization.

5.3 Key Takeaways

Seamless Integration: Alluxio connects with all major cloud and on-premises storage systems through UFS connectors.
Persistent Layer of Truth: UFS maintains durable, consistent data storage while Alluxio focuses on locality and acceleration.

6. User-facing: Multi-Protocol Access

Alluxio provides multiple interfaces for applications to access data, ensuring compatibility with a wide range of existing tools and frameworks. It also offers powerful features to optimize performance and ensure high availability. This paper provides an overview of the primary data access methods and related features.

Alluxio offers several ways for applications and users to interact with the data it manages:

POSIX API via FUSE: Mount Alluxio as a local filesystem, allowing any application or command-line tool (ls, cat, cp) to interact with Alluxio using standard file operations. This is the most common method for seamless integration with existing applications, especially for ML/AI training workloads.

S3 API: Expose an S3-compatible endpoint, allowing applications built with AWS S3 SDKs (like Python's boto3 or the Java S3 client) to connect to Alluxio. This is ideal for data science and ML workloads that are already integrated with S3.=

Python API via FSSpec: A Pythonic filesystem interface(alluxiofs) for developers using libraries like Pandas, PyArrow, and Ray. It provides a native and efficient way to interact with Alluxio within the Python ecosystem.

7 Fault Tolerance

Alluxio is designed to be highly resilient to failures. It has multiple mechanisms to ensure that I/O operations can continue gracefully even when components become unavailable in the following events:

Network Partition

If a client tries to read data from a worker and that worker is unavailable due to network issues, the client can automatically fallback to reading the data directly from the Under File System. This ensures the application's read request succeeds without interruption, even if the Alluxio cluster becomes unresponsive.

Worker Restart

Workers may need to restart due to hardware upgrades or maintenance. After restarting, the worker reloads its persisted Worker ID and reclaims its segment on the ring. Since both metadata and data pages are stored on local persistent storage (NVMe + RocksDB), cached data is automatically rediscovered and reused—no cold rebuild is required.

Hardware Failure

If a worker becomes unreachable for a configured timeout, the service registry (etcd) marks it inactive. The hashing ring then automatically rebalances: subsequent requests mapped to that segment are rerouted to neighboring workers. These requests result in cold reads from the underlying object store, but clients experience no visible errors or retries.

8. Summary

Alluxio has evolved from a “Big Data Acceleration Layer” into an AI-Native Data Access Platform. Through its decentralized DORA architecture, page-level caching, and cloud-native orchestration, Alluxio delivers a unique combination of performance, scalability, and cost-efficiency.

With Alluxio:

Data is always close to compute.

GPUs never wait.

AI workloads run anywhere, seamlessly.

‍

Share this post

White Paper

Meet in the Middle for a 1,000x Performance Boost Querying Parquet Files on Petabyte-Scale Data Lakes

Storing data as Parquet files on S3 is increasingly used not just as a data lake but also as a lightweight feature store for ML training/inference or a document store for RAG. However, querying petabyte- to exabyte-scale data lakes directly from cloud object storage remains notoriously slow (e.g., latencies ranging from hundreds of milliseconds to several seconds on AWS S3).

This Whitepaper introduces how RAG and Feature Stores can benefit from Alluxio’s high-performance caching, serving as an acceleration layer atop hyperscale data lakes for queries on Parquet files. Alluxio enables direct, ultra-low-latency point queries on Parquet files, achieving submillisecond latency per query and 3,000 queries per second on a single thread—representing a 1,000x performance gain over querying Parquet files stored on S3 Standard and achieves equivalent latency as S3 Express but at a fraction of the cost.

‍

Optimizing I/O for AI Workloads in Geo-Distributed GPU Clusters

Are you struggling with slow data access, managing AI infrastructure at scale, low GPU utilization, or budget constraints in Geo-distributed GPU clusters?

In this insightful white paper, we’ll discuss common causes of slow AI workloads and low GPU utilization, how to diagnose the root cause, and offer solutions to the most common root cause of underutilized GPUs. You'll learn:

Challenges introduced by multi-GPU cluster architecture and the metrics they impact
Diagnosing and common causes of low GPU utilization
How to optimize data loading in order to address the I/O bottlenecks
How Alluxio Distributed Cache solves data loading performance bottlenecks and enables full utilization of GPU resources
A case study of a global e-commerce giant, showcasing how Alluxio accelerates slow and unstable AI/ML training workloads with 20% improvment in GPU utilization and 50% cloud cost reduction

‍

Alluxio Datasheet

AI Platform and Data Infrastructure teams rely on Alluxio Data Acceleration Platform to boost the performance of data-intensive AI workloads, empower ML engineers to build models faster, and lower infrastructure costs.

With high-performance distributed cache architecture as its core, the Alluxio Data Acceleration Platform decouples storage capacity from storage performance, enabling you to more efficiently and cost-effectively grow storage capacity without worrying about performance.

Data Acceleration
Simplicity at Scale
Architected for AI Workload Portability
Lower Infrastructure Costs

In this datasheet, you will learn how Alluxio helps eliminate data loading bottlenecks and maximize GPU utilization for your AI workloads.

‍

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer

Request a demo

Alluxio Enterprise AI

Alluxio Enterprise Data

X