We are excited to announce the release of Alluxio 2.5.0! This is the first release on the Alluxio 2.5.X line.
Alluxio’s capabilities as a Data Orchestration framework has encouraged users to onboard more, if not all, of their data driven applications to an Alluxio powered data access layer. Alluxio 2.5 focuses on two Northbound APIs – POSIX and S3. These two interfaces, combined with the HDFS interface, make up a majority of the APIs preferred by big data applications.
AI/ML workloads have become increasingly popular on Alluxio, and the improvements in the POSIX interface serve to further the performance and compatibility of Alluxio with these workloads.
At the same time, Alluxio continues to integrate with the latest cloud and cluster orchestration technologies. In 2.5, Alluxio has new connectors for Google Cloud Storage and Azure Data Lake Storage Gen 2 as well as better operability functionality for Kubernetes environments.
Highlights
JNI Based POSIX API
Alluxio 2.5 introduces a new JNI-based FUSE integration to support POSIX data access. This new JNI-based FUSE integration improves the performance by 3x to 5x for workloads of high-performance and high-concurrency such as AI/ML training. For more information, see the docs.
S3 Northbound API
Alluxio 2.5 improves the compatibility of the S3 Northbound API including basic authorization, bucket listing, and directory operations. Alluxio 2.5 is compatible with S3 browsing software such as s3browser, allowing administrators to maintain and manage the Alluxio namespace through an object storage console. For more information, see the docs.
ADLS Gen2 Connector
Alluxio 2.5 introduces a connector for Azure Data Lake Storage Gen 2. This allows users to benefit from the various optimizations provided by ADLS Gen2 when using Azure object storage. For more information, see the docs.
Native GCS Connector
Alluxio 2.5 updates the Google Cloud Storage connector to use the native Google provided SDK. This enables users to benefit from the latest optimizations and features available from the GCS SDK such as JSON file based login. For more information, see the docs.
Remote logging in K8s environment
Alluxio 2.5 supports remote log server in K8s environment. One challenge users have in a containerized environment is the logs getting disposed or overwritten when a container is killed or restarted. With the remote logger, the logs will be sent to a centralized location (a dedicated pod). For more information, see the docs.
Other Improvements
- Add and improve metrics (91585c8) (4d2fd25) (bcaf7f8) (287658) (be35643) (67dc33d) (bcaf7f8)
- Improve log and error message (6e9577e) (eba2b98) (d01dbf3) (b13a2d4) (6c581b) (7fd2c0b) (f0192da) (302d386) (ca4e8e7) (a000798) (eaed75b) (29631bd) (1351717) (ecd5b3e) (446f71a)
- Improve test coverage and code stability (b77255c) (0fe89d3) (8b43a7b) (cb1aeb1)
- Update dependency version (0e2ed87b) (192dfa1) (97ac1a2)
- Improve distributed job commands (4b403bf) (06f6224)
- Support logLevel command for job master/worker (0c72e4e) (bcddba8)
- Improve logging in Docker, Kubernetes environment (bf06be1) (97625c4)
- Improve embedded journal with large entries from catalog service (d4c9793) (d5281a1) (6d07997)
- Add COSN UFS to provide a better ability to read and write COS (0de2af1)
- Add async-profiler and Arthas to Alluxio docker image for easy debugging and profiling (e2b65c2)(0bbecc7)
- Skip creating parent directory for object stores to improve performance(d8517db)
- Support running Alluxio in IntelliJ by import run configuration(86df061)
- Add Initial journal replay before ZK connections in UFS journal to support replaying large journal (666d51b)
- Improve tiered store space allocation method (215aaac)
- Add statelock command to get state lock thread holders (57fd5da)
- Add maven plugin for proto lock (8d1f455)
- Help Alluxio client find correct URI authority instead of throwing exceptions (0a745cc)
- Support metrics system configuration in Helm chart (9833d5)
- Improve the UFS Capacity and used size to human-readable (ec87831)
- Get a chunk of PartitionColumnStatistics at a time from Hive (5dfa1c3)
- Support using IP as connect host (9d0018f)
Bug Fixes
- Fix and improve Alluxio Dataproc and EMR scripts (614824e4) (648090ce) (b346d89) (f7c2acfa) (91f9220)
- Fix Alluxio metadata synchronization with UFS (c7f5c1bd) (446f71a)
- Update sync status after processing the entire sync (e8b5f6d)
- Fix cp command with wildcard and special characters in path (6ed5c14)
- Auto close worker client when file read finished to avoid resources exhausted (53e9cee)
- Avoid Active Sync manager connecting to UFS when becoming secondary mater (ded47c80)
- Fix NullPointerException in worker tier promote task (763381a)
- Fix AbstractWriteHandler abort (3ecf217)
- Cancel in-progress checkpoints when thread is shutdown (d1b4893)
- Pass options to DelegatingFileSystem (39556e9)
- Fix java opts passing in docker and kubernetes environment (7a7ebd5) (9be8860)
- Fix comma-separated medium types parsing in Helm chart (16daf4b3)
- Fix block location iteration with rocksdb (0aff9389)
- Fix Object UFS edge case (fcf0bbcd) (fec5421)
- Resolve null values in structured data service with Glue (35e8a9)
Acknowledgment
We have a long list of community contributors who help Alluxio 2.5.0. This release would not have been possible without your support! Especially, we would like to thank:
- Ke Wang from Facebook for improving local cache
- Yang Che from Alibaba, Chao Wang, Mickey Zhang from Microsoft, Yili Luo from Nanjing University, and Baolong Mao from Tencent for the implementation and optimization of the new JNI FUSE integration
- Pan Liu, Sheng Liu, Han Zhang, Runzhi Wang, and Baolong Mao from Tencent for improving Presto related integration and COS integration
- Zac Blanco from UCSD, Ce Zhang from China Unicom, Jiankang Li, Micah Zhao for improving documentation, code style, and general code health
- Nirav Chotai improved Alluxio helm chart integration
- Yichuan Huang from Robinhood improved Alluxio’s structured data service
- Github user goodoid for improving security in docker image
Enjoy the new release and look forward to hearing your feedback at community slack channel.