Whats New in Alluxio 2.8: Enhanced S3 API Functionality Enterprise-grade Security and Data Migration With Better Usability and Low Cost

May 4, 2022

Adit Madan

Today, we are thrilled to announce that Alluxio 2.8 is generally available (GA) for both the free open source Alluxio Community Edition and Alluxio Enterprise Edition!

The Alluxio 2.8 version focuses on the S3 API, enterprise-grade security, scalability and observability in data migration. Enhanced S3 API makes managing Alluxio easier than ever. Features such as encryption at rest and policy-driven data management further improve Alluxio’s functionality to support enterprise customers.

Alluxio’s set of S3, HDFS and POSIX APIs enables storage-agnostic and multi-cloud platform deployments. Alluxio can be used with Spark, Presto, PyTorch and Tensorflow on various cloud platforms, such as AWS, GCP, and Azure Cloud, and on Kubernetes in private data centers or public clouds.

Download a free copy of Alluxio Community Edition or try the Enterprise Edition here. Thanks to the community for their valuable contributions to the Alluxio 2.8 release.

Highlights of Both Alluxio Community and Enterprise Edition

Enhanced S3 API with Metadata Tagging

S3 API has become the de-facto standard for object storage both on-premises and in the cloud. Using the S3 API, end-users of data-driven applications and admins can rapidly onboard Alluxio for new uses. Organizations can integrate Alluxio without introducing a custom driver for existing applications to greatly streamline DevOps.

Alluxio 2.8 enhances the support for the S3 API with metadata tagging capabilities.

Metadata operations can be achieved through the S3 object and bucket tagging APIs. We have added object and bucket tagging APIs. By specifying a query-parameterized string in this header, you can attach user-defined tags to the uploaded object. Please refer to the Alluxio S3 documentation for full details on tagging operations supported.

Better Stability and Scalability for Training Workloads

FUSE2 is the current default for the FUSE client. With the release of Alluxio 2.8, FUSE3 integration is newly supported, enabling the future optimization of performance and scalability. FUSE2 will eventually be phased out for FUSE3.

You can now also mount FUSE via the Alluxio CLI or configuration properties. The FUSE unmount mechanism is improved to reduce the chance of leaving FUSE unmounted on the host machines (see the documentation for FUSE unmount).

Highlights of Alluxio Enterprise Edition

Enterprise-grade Security: Data Encryption and Delegated Tokens for AssumeRole

As a major enhancement to enterprise-grade security, Alluxio 2.8 adds features to support server-side encryption capabilities for securing and governing data. This makes it easy for security-sensitive organizations to transmit and store their sensitive data with strict encryption compliance and regulatory requirements. In conjunction with SSL, Alluxio now supports server-side encryption, ensuring data security. Data stored on the Alluxio worker is encrypted when written to disk and decrypted when it is read and sent to the client or UFS.

Alluxio 2.8 offers multiple encryption zones, with each encryption zone mapping to an Alluxio URI. By mapping paths in the file system, users can specify which parts of the namespace will be encrypted.

The key credentials can be stored in Hashicorp Vault. Learn how to use the encryption at rest and the supported encryption algorithms here.

Alluxio 2.8 also introduces a new under storage access token framework. Rather than storing credentials on each worker, workers request temporary access tokens from the master. With AWS S3, users can configure the worker to request a unique AssumeRole token from the master in order to access S3 objects. Read more about the AssumeRole feature for S3 in the documentation.

Data Migration Across Heterogeneous Storage Systems

Data migration is a challenge for organizations having data stored across vendors, clouds, or regions. The data migration capabilities of Alluxio 2.8 further eliminate vendor lock-in and allow organizations to choose whichever platform they want to use to store their data.

Alluxio 2.8 delivers usability enhancements for policy-driven data management, facilitating data access and movement between heterogeneous storage systems at reduced cost. As a result, organizations can use Alluxio for data movement, whether policy-based or manual, to choose the storage system that best suits their needs without application migrations.

The Alluxio 2.8 release also enhances the observability of policy-driven data management to allow users to troubleshoot problems when policy-driven jobs fail. With a new CLI and metrics that report the policy engine status, policy execution status, and action status, users have a deeper view into their data movement. This documentation describes the new CLI in detail.

More Info

For an exhaustive list of major features and bug fixes of Alluxio 2.8, please refer to the Community Edition release notes and Enterprise Edition release notes.

Download Alluxio 2.8 or schedule a demo today! Join 9000+ members in our community slack channel to ask any questions and provide your feedback.

Share this post

Blog

Diagnose & Fix Slow Distributed Training

Got periodic drops in GPU utilization? GPU Stalls? Training capacity grinding to a halt? Learn how checkpoint writes could be the cause of your suddent, yet periodic drops in training performance.

20x Faster Training Data Reads with Alluxio and Ray on Anyscale: A Cross-Region Benchmark

Alluxio and Anyscale benchmark achieves 20x faster cross-region data reads for AI training workloads on GCS.

Alluxio AI 3.9 Brings Checkpoint Acceleration to Any AI Training Framework

Alluxio AI 3.9 introduces POSIX Write Cache, eliminating the checkpoint write bottleneck in distributed training with 7.6 GiB/s per node throughput and sub-2ms P99 latency. Get all of the details here!

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer

Request a demo