New Features in Alluxio Enterprise AI 3.6

May 20, 2025

Hope Wang

Bill Hodak

We're excited to announce our latest release: Alluxio Enterprise AI 3.6! This new version brings significant enhancements focused on accelerating cold start for inference, optimizing checkpoint writing, and improving data access resilience.

Accelerate Cold Starts for Inference

Alluxio Enterprise AI 3.6 introduces capabilities to improve the performance of ‘cold starts’ when loading a new or updated model for serving inference.

For organizations running inference at scale, distributing models from training to production is a critical step. In this process, models are typically deployed to a centralized model repository and loaded from the repository into GPU memory on each inference server. As model file sizes grow larger and inference infrastructure scales across different regions and clouds, organizations face significantly high cold start latency along with growing egress and cloud access costs.

‍

Take Alluxio customer RedNote as an example. This leading e-commerce and social media platform in Asia needs to fine-tune their search and recommendation model nightly and distribute terabytes of updated model files to their inference infrastructure that spans thousands of servers across multiple clouds. All of this needs to be completed before their 150 million active users become, well, active in the morning. RedNote deployed Alluxio to both accelerate their nightly fine-tuning workload by 41% and to supercharge model loading and cold starts by 10X, while also cutting cloud costs related to model distribution by 80%!

‍

Alluxio boosts cold start speeds by placing Alluxio Distributed Cache in each region, allowing model files to be cached locally in the region instead of being copied to each server. Inference servers then retrieve new or updated model files directly from the Alluxio Distributed Cache. The system is further optimized by caching model files locally on each inference server, reading model files from the distributed cache once per inference server rather than once per GPU, and loading model files into memory from a local memory pool.

Our benchmarks show impressive results: Alluxio Distributed Cache with Local Memory Pools achieves 32 GiB/s, while Alluxio Distributed Cache alone reaches 9.3 GiB/s, compared to the maximum network throughput of 11.6 GiB/s.

Pushdown-based Parquet Query Acceleration [EXPERIMENTAL]

Alluxio AI Enterprise 3.6 includes a new query pushdown capability for accelerating queries on data stored in Parquet files. This exciting new feature was developed through a tight collaboration with the engineering team at Salesforce. With Alluxio Distributed Caching and our new query pushdown capability, feature stores and retrieval-augmented generation applications that query Parquet data on S3 can leverage Alluxio to achieve sub-millisecond Time-to-First-Byte (TTFB) - comparable to S3 Express One Zone but at a fraction of the cost and with the flexibility to scale beyond S3 Express's single-account throughput limit.

Join our upcoming webinar to learn more: https://www.alluxio.io/events/meet-you-in-the-middle-1000x-performance-for-parquet-queries-on-pb-scale-data-lakes.

Fast Model Training Checkpoint Writing [EXPERIMENTAL]

We announced the availability of Alluxio's CACHE_ONLY Write Mode just a few months ago and are excited to share that we've made it even better and faster in this release. In this release, we introduced the new ASYNC write mode, providing up to 8GiB/s write throughput on a 100 Gbps network environment to significantly shorten the model training checkpoint process.

AI training workloads periodically write checkpoint files to 'save' the state of the partially trained model in the event the workload fails and needs to be restarted. These checkpoint files are typically large and can take hours or longer to create. During checkpoint creation, model training pauses, significantly slowing down end-to-end model training time.

Alluxio accelerates the performance of checkpoint file creation by writing to the Alluxio cache instead of directly to the underlying file system, which avoids network and storage bottlenecks. This faster checkpoint file creation accelerates end-to-end model training time. New in this release is the ASYNC write mode, which persists checkpoint files to the underlying file system (UFS) asynchronously.

Read the Documentation

What’s New for Alluxio Admins

New Management Console

This release introduces a new Management Console that provides observability and is designed to help Alluxio admins manage Alluxio more easily.

This new web-based console provides a graphical interface that displays key cluster information in a clear, actionable format. The cluster monitoring dashboard shows the current state of the Alluxio cluster, including cache usage relative to allocated capacity, status of Alluxio coordinators, workers, and etcd. Key metrics such as read/write throughput, cache eviction, and cache hit rate are prominently displayed.

The new console also incorporates key features, which can be performed directly through the web interface without using command-line tools.

Managing the mount table
Configuring quotas
Managing priority and TTL policies
Submitting cache preload and free jobs
Collecting cluster diagnostic information

Read the Documentation

Multi-Tenancy Support

This release also brings multi-tenancy support, enabling multiple teams to maintain a single secure, multi-tenant Alluxio cache.

Alluxio now enforces multi-tenancy policy through seamless integration with Open Policy Agent (OPA), an open-source framework for policy management. Organizations can define multi-tenant policies using OPA's fine-grained role-based access controls (RBAC), which are then enforced by a new Alluxio gateway component. This implementation works seamlessly across both Alluxio APIs and the Management Console, ensuring consistent security enforcement throughout the platform.

Read the Documentation

Multi-Availability Zone Failover Support

In this release, we’ve added support for data access failover in multi-Availability Zone architectures to provide high availability and ensure stronger data access resilience. This enhancement proves particularly valuable for model distribution use cases, ensuring uninterrupted access to your models across cloud regions.

Ensuring seamless, accelerated access to AI data even during cloud Availability Zone outages is critical for maintaining continuous operations. Keeping AI applications running during these outages prevents costly downtime and ensures uninterrupted service delivery.

Alluxio automatically configures clusters in multiple Availability Zones without requiring manual installation. The system automatically synchronizes cached data across all clusters, and during an Availability Zone outage, the Alluxio client automatically routes requests to clusters in other Availability Zones.

Read the Documentation

Virtual Path Support in FUSE [EXPERIMENTAL]

Alluxio AI 3.6 introduces virtual path support, allowing users to define custom access paths to their data resources. By creating these virtual paths, administrators can establish an abstraction layer that effectively masks the actual physical data locations in the underlying storage systems. This abstraction provides flexibility when managing complex data architectures across multiple storage systems or organizational boundaries.

Read the Documentation

Want to learn more about Alluxio Enterprise AI? Schedule a demo today!

Share this post

Blog

Alluxio and Oracle Cloud Infrastructure: Delivering Sub-Millisecond Latency for AI Workloads

Oracle Cloud Infrastructure has published a technical solution blog demonstrating how Alluxio on Oracle Cloud Infrastructure (OCI) delivers exceptional performance for AI and machine learning workloads, achieving sub-millisecond average latency, near-linear scalability, and over 90% GPU utilization across 350 accelerators.

Make Multi-GPU Cloud AI a Reality

If you’re building large-scale AI, you’re already multi-cloud by choice (to avoid lock-in) or by necessity (to access scarce GPU capacity). Teams frequently chase capacity bursts, “we need 1,000 GPUs for eight weeks,” across whichever regions or providers can deliver. What slows you down isn’t GPUs, it’s data. Simply accessing the data needed to train, deploy, and serve AI models at the speed and scale required – wherever AI workloads and GPUs are deployed – is in fact not simple at all. In this article, learn how Alluxio brings Simplicity, Speed, and Scale to Multi-GPU Cloud deployments.

Accelerate your Cloud Object Storage for AI Workloads

Turn your existing S3 storage into an AI-ready storage layer with sub-ms latency and terabytes per second throughout per Alluxio cluster with linear scalability — no data migration required.

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer

Request a demo