The Alluxio 2.8 version focuses on the S3 API, enterprise-grade security, scalability and observability in data migration. Enhanced S3 API makes managing Alluxio easier than ever. Features such as encryption at rest and policy-driven data management further improve Alluxio’s functionality to support enterprise customers.
Software Engineer, Alluxio
This article highlights synergy between the two widely adopted open-source projects, Alluxio and Presto, and demonstrates how together they deliver a self-serve data architecture across clouds.
With this release, Alluxio has strengthened its position as a de-facto data unification and acceleration solution in data analytics and machine learning pipelines. The solution is optimized to support Spark, Presto, Tensorflow, and PyTorch, and is available on multiple cloud platforms such as AWS, GCP, and Azure Cloud, and also on Kubernetes in private data centers or public clouds.
Alluxio is the data orchestration platform to unify data silos across heterogeneous environments. The following blog will discuss the architecture combining Spark with Alluxio.
This blog is the first in a series introducing Alluxio as the data platform to unify data silos across heterogeneous environments. The next blog will include insights from PrestoDB committer Beinan Wang to uncover the value for analytics use cases, specifically with PrestoDB as the compute engine.
Alluxio 2.6 significantly improves the performance of data-intensive AI/ML workloads across any storage, and also improves the general maintainability and visibility of Alluxio clusters, especially for large-scale deployments. We have taken the feedback and contributions from the community and introduced features which simplify deployment, introduce new data management capabilities, optimize performance, and provide enhanced visibility into system behavior.
Alluxio 2.5 focuses on improving interface support to broaden the set of data driven applications which can benefit from data orchestration. The POSIX and S3 client interfaces have greatly improved in performance and functionality as a result of the widespread usage and demand from AI/ML workloads and system administration needs. Alluxio is rapidly evolving to meet the needs of enterprises that are deploying it as a key component of their AI/ML stacks.
Data processing is increasingly making use of NVIDIA computing for massive parallelism. Advancements in accelerated compute mean that access to storage must also be quicker, whether in analytics, artificial intelligence (AI), or machine learning (ML) pipelines.
This post outlines a solution for building a hybrid data lake with Alluxio to leverage analytics and AI on Amazon Web Services (AWS) alongside a multi-petabyte on-premises data lake. Alluxio’s solution is called “zero-copy” hybrid cloud, indicating a cloud migration approach without first copying data to Amazon Simple Storage Service (Amazon S3).