Alluxio 2.5.0 Release

We are excited to announce the release of Alluxio 2.5.0! This is the first release on the Alluxio 2.5.X line.

Alluxio’s capabilities as a Data Orchestration framework has encouraged users to onboard more, if not all, of their data driven applications to an Alluxio powered data access layer. Alluxio 2.5 focuses on two Northbound APIs – POSIX and S3. These two interfaces, combined with the HDFS interface, make up a majority of the APIs preferred by big data applications.

AI/ML workloads have become increasingly popular on Alluxio, and the improvements in the POSIX interface serve to further the performance and compatibility of Alluxio with these workloads.

At the same time, Alluxio continues to integrate with the latest cloud and cluster orchestration technologies. In 2.5, Alluxio has new connectors for Google Cloud Storage and Azure Data Lake Storage Gen 2 as well as better operability functionality for Kubernetes environments.

Highlights

JNI Based POSIX API

Alluxio 2.5 introduces a new JNI-based FUSE integration to support POSIX data access. This new JNI-based FUSE integration improves the performance by 3x to 5x for workloads of high-performance and high-concurrency such as AI/ML training. For more information, see the docs.

S3 Northbound API

Alluxio 2.5 improves the compatibility of the S3 Northbound API including basic authorization, bucket listing, and directory operations. Alluxio 2.5 is compatible with S3 browsing software such as s3browser, allowing administrators to maintain and manage the Alluxio namespace through an object storage console. For more information, see the docs.

ADLS Gen2 Connector

Alluxio 2.5 introduces a connector for Azure Data Lake Storage Gen 2. This allows users to benefit from the various optimizations provided by ADLS Gen2 when using Azure object storage. For more information, see the docs

Native GCS Connector

Alluxio 2.5 updates the Google Cloud Storage connector to use the native Google provided SDK. This enables users to benefit from the latest optimizations and features available from the GCS SDK such as JSON file based login. For more information, see the docs

Remote logging in K8s environment

Alluxio 2.5 supports remote log server in K8s environment. One challenge users have in a containerized environment is the logs getting disposed or overwritten when a container is killed or restarted. With the remote logger, the logs will be sent to a centralized location (a dedicated pod). For more information, see the docs

Other Improvements

Bug Fixes

  • Fix and improve Alluxio Dataproc and EMR scripts (614824e4) (648090ce) (b346d89) (f7c2acfa) (91f9220)
  • Fix Alluxio metadata synchronization with UFS (c7f5c1bd) (446f71a)
  • Update sync status after processing the entire sync (e8b5f6d)
  • Fix cp command with wildcard and special characters in path (6ed5c14)
  • Auto close worker client when file read finished to avoid resources exhausted (53e9cee)
  • Avoid Active Sync manager connecting to UFS when becoming secondary mater (ded47c80)
  • Fix NullPointerException in worker tier promote task (763381a)
  • Fix AbstractWriteHandler abort (3ecf217)
  • Cancel in-progress checkpoints when thread is shutdown (d1b4893)
  • Pass options to DelegatingFileSystem (39556e9)
  • Fix java opts passing in docker and kubernetes environment (7a7ebd5) (9be8860)
  • Fix comma-separated medium types parsing in Helm chart (16daf4b3)
  • Fix block location iteration with rocksdb (0aff9389)
  • Fix Object UFS edge case (fcf0bbcd) (fec5421)
  • Resolve null values in structured data service with Glue (35e8a9)

Acknowledgment

We have a long list of community contributors who help Alluxio 2.5.0. This release would not have been possible without your support! Especially, we would like to thank:

  • Ke Wang from Facebook for improving local cache
  • Yang Che from Alibaba, Chao Wang, Mickey Zhang from Microsoft, Yili Luo from Nanjing University, and Baolong Mao from Tencent for the implementation and optimization of the new JNI FUSE integration
  • Pan Liu, Sheng Liu, Han Zhang, Runzhi Wang, and Baolong Mao from Tencent for improving Presto related integration and COS integration
  • Zac Blanco from UCSD, Ce Zhang from China Unicom, Jiankang Li, Micah Zhao for improving documentation, code style, and general code health
  • Nirav Chotai improved Alluxio helm chart integration
  • Yichuan Huang from Robinhood improved Alluxio’s structured data service
  • Github user goodoid for improving security in docker image

Enjoy the new release and look forward to hearing your feedback at community slack channel.