alluxio engineering Archives

Cross Cluster Synchronization in Alluxio – Part 3: Discussions and Conclusion

February 8, 2023 By Tyler Crain

Following part 1 and part 2, this final blog of the series discusses some design decisions and details, as well as certain future work. Discussions and Future Work Why not exactly once delivery for pub/sub? As we know, exactly once message delivery for pub/sub would greatly simplify our design and there do exist many powerful … Continued

Cross Cluster Synchronization in Alluxio – Part 2: Mechanism

February 8, 2023 By Tyler Crain

This is part 2 of the blog series talking about the design and implementation of the Cross Cluster Synchronization mechanism in Alluxio. In the previous blog, we discussed the scenario, background and how metadata sync is done with a single Alluxio cluster. This blog will describe how metadata sync is built upon to provide metadata … Continued

Cross Cluster Synchronization in Alluxio – Part 1: Scenarios and Background

February 8, 2023 By Tyler Crain

This is a blog series talking about the design and implementation of the Cross Cluster Synchronization mechanism in Alluxio. This mechanism ensures that the metadata is consistent when running multiple Alluxio clusters. Part 1 of this blog series discusses the scenario and background. Alluxio lies in between the storage and compute layers in order to … Continued

What’s New in Alluxio 2.8: Enhanced S3 API Functionality, Enterprise-grade Security and Data Migration With Better Usability and Low Cost

May 4, 2022 By Adit Madan and Hope Wang

The Alluxio 2.8 version focuses on the S3 API, enterprise-grade security, scalability and observability in data migration. Enhanced S3 API makes managing Alluxio easier than ever. Features such as encryption at rest and policy-driven data management further improve Alluxio’s functionality to support enterprise customers.

What’s New in Alluxio 2.7: Enhanced Scalability, Stability and Major Improvements in AI/ML Training Efficiency

November 16, 2021 By Adit Madan and Hope Wang

With this release, Alluxio has strengthened its position as a de-facto data unification and acceleration solution in data analytics and machine learning pipelines. The solution is optimized to support Spark, Presto, Tensorflow, and PyTorch, and is available on multiple cloud platforms such as AWS, GCP, and Azure Cloud, and also on Kubernetes in private data centers or public clouds.

Design and Implementation of Alluxio POSIX Support

August 31, 2021

Applications like Tensorflow, PyTorch can access data through Alluxio FUSE service without modifying any code just like accessing their local file systems by Unix/Linux POSIX API. This article describes the design and implementation of Alluxio FUSE service, its current status and future plans.

Tags: alluxio engineering, fuse, performance, POSIX

What’s New in Alluxio 2.6: Better Performance for AI/ML Workloads plus Increased Operating Metrics Visibility

July 1, 2021 By Adit Madan

Alluxio 2.6 significantly improves the performance of data-intensive AI/ML workloads across any storage, and also improves the general maintainability and visibility of Alluxio clusters, especially for large-scale deployments. We have taken the feedback and contributions from the community and introduced features which simplify deployment, introduce new data management capabilities, optimize performance, and provide enhanced visibility into system behavior.

Alluxio Data Orchestration for Machine Learning

April 27, 2021

Alluxio’s capabilities as a Data Orchestration framework have encouraged users to onboard more of their data-driven applications to an Alluxio powered data access layer. Driven by strong interests from our open-source community, the core team of Alluxio started to re-design an efficient and transparent way for users to leverage data orchestration through the POSIX interface.

Tags: alluxio day, alluxio engineering, data orchestration, fuse, machine learning, POSIX

Introducing what’s new in Alluxio 2.5

April 24, 2021

Alluxio 2.5 focuses on improving interface support to broaden the set of data driven applications which can benefit from data orchestration. The POSIX and S3 client interfaces have greatly improved in performance and functionality as a result of the widespread usage and demand from AI/ML workloads and system administration needs. Alluxio is rapidly evolving to meet the needs of enterprises that are deploying it as a key component of their AI/ML stacks.

Tags: alluxio engineering, data orchestration, hybrid cloud, office hour, release

Tag: alluxio engineering