The Alluxio team is thrilled to announce the release of v2.6 of the Alluxio Data Orchestration Platform. Version 2.6 for both the free open source Community Edition and Alluxio Enterprise Edition are now generally available.
Alluxio 2.6 significantly improves the performance of data-intensive AI/ML workloads across any storage, and also improves the general maintainability and visibility of Alluxio clusters, especially for large-scale deployments. We have taken the feedback and contributions from the community and introduced features which simplify deployment, introduce new data management capabilities, optimize performance, and provide enhanced visibility into system behavior.
With this release, Alluxio expands the spectrum of data-driven workloads that benefit from Alluxio’s data orchestration capabilities. At the same time, time to production and time to value are significantly reduced with improved visibility and ease of operation.
Free downloads Alluxio Community Edition and free trials Alluxio Enterprise Edition can be found here. Join thousands of members in our Slack channel to ask any questions and provide your feedback. And thank you to everyone who contributed to this release.
Streamlined Data Orchestration for AI/ML Workloads
Performance is a key benefit Data Orchestration brings to AI/ML workloads, and Alluxio has provided significant value to users in that respect.
Alluxio 2.6 builds upon the work done in Alluxio 2.5 to provide a complete solution for AI/ML workloads. The improvements span from the very first task of deploying Alluxio to the ever difficult goal of monitoring the system after it is running production workloads.
From a deployment perspective, most users used a containerized approach, often leveraging Kubernetes for container orchestration. To simplify deployments, we combined the Alluxio worker and FUSE processes. Users no longer need to configure multiple processes to be housed in the same Kubernetes pods or use other workarounds to ensure the availability of both the Alluxio worker and FUSE process on all of the required nodes.
Another benefit of the consolidation of the two processes is the ability to avoid inter-process communication. The reduction of communication overhead showed significant performance improvements for workloads which had a large number of small files. This happens to be a common case for training workloads such as image recognition.
Finally, improvements to the user experience of data loading through Alluxio native commands such as distributed load makes setting up training data much easier. This avoids the need for custom scripts or another system to prepare the data into Alluxio cache.
Improved System Visibility
Data Orchestration frameworks are involved in many mission-critical workflows. Therefore, visibility into Alluxio’s system status is paramount to successful operation, maintenance, and optimization of the system.
Alluxio 2.6 takes system visibility to the next level, providing detailed information enabling users to drill down into specific component behavior and trace request handling and timing. These new metrics and capabilities provide a much better toolkit for devops when troubleshooting a problem that has been narrowed down to a subset of the system.
System administrators should still rely on general statistics for Alluxio system observability. Alluxio 2.6 provides templates for common monitoring dashboards like Grafana so new deployments will have a quick start for tracking Alluxio’s health. From the collaboration and reports of Alluxio users, we have also added documentation for interpreting the default metrics and how to adjust system configuration or capacity accordingly.
System visibility is a key focus for the Alluxio project. In coming releases, we plan to further improve visibility by introducing logical Alluxio metrics such as file and block access rates, job progress and history, and cluster load heatmaps.