We are thrilled to announce the release of Alluxio 2.5!
Alluxio 2.5 focuses on improving interface support to broaden the set of data driven applications which can benefit from data orchestration. The POSIX and S3 client interfaces have greatly improved in performance and functionality as a result of the widespread usage and demand from AI/ML workloads and system administration needs. Alluxio is rapidly evolving to meet the needs of enterprises that are deploying it as a key component of their AI/ML stacks.
Downloads can be found here. Join thousands of members in our Slack channel to ask any questions and provide your feedback! Thank you to everyone who contributed to this release!
Data Orchestration for AI/ML Workloads
Alluxio’s Data Orchestration capabilities are immensely valuable for improving the performance and data pipelining of AI/ML workloads. For example, Alibaba saw over 40% improvement in training time and cost improvement by deploying Alluxio (article).
AI/ML workloads naturally use high-spec machines with expensive GPUs and pairing these GPUs with the appropriate I/O is critical for training efficiency and cost effectiveness. The costs of hardware combined with long training times make acceleration a key goal for our users. By deploying Alluxio on these machines, users benefit from both distributed, high-performance storage and data management functionalities. Specifically, our users see the necessity of the Alluxio layer to fuel growing GPU I/O demand, which is outpacing object storage/network I/O growth. Finally, we observed that our users were able to run Alluxio with only the underutilized resources such as memory, disk, and CPU on the GPU nodes, resulting in no additional cost or deployment overhead.
While Alluxio fits well into the AI/ML architecture, we still needed to overcome the challenges of API compatibility. Applications like Tensorflow and PyTorch most commonly use a POSIX API as opposed to the HDFS-compatible API for analytics workloads, so the Alluxio FUSE layer was a natural fit. In order to further improve the performance and capabilities of the interface, we implemented our own JNI FUSE layer which is a replacement for the legacy JNR FUSE based integration. JNI FUSE already solves compatibility issues and provides better latency and throughput in highly concurrent workloads, and we expect to further enhance the capabilities in upcoming Alluxio releases.
For further reading, check out this presentation by Microsoft and consider joining our special interest group which meets weekly to discuss on-going development.
Cloud Native Integrations
A large portion of Alluxio users are deployed in the cloud, and therefore the Alluxio system is committed to integrating with the cloud ecosystem in the most advanced ways. Alluxio 2.5 introduces improvements for all three major public cloud providers, AWS, GCP, and Azure, as well as the defacto standard in container orchestration, Kubernetes.
The latest connectors to cloud storage enable users to benefit from the recommended security models in the cloud such as AWS’s Security Token Service (STS) and GCP’s service account keys. We have also introduced native support for Azure Data Lake Storage Gen 2, which is the recommended service for building big data applications on Azure. ADLS Gen 2 provides file-level semantics and optimizations as well as security.
For further reading, check out the docs for AWS, Azure, and GCP.
More Info
Want to hear from the core developers? Join us for the live webinar on the 2.5 release!
You can find more information in the 2.5.0 official release notes.Have questions? Come join the Community Slack Channel.