Alluxio 2.5 focuses on improving interface support to broaden the set of data driven applications which can benefit from data orchestration. The POSIX and S3 client interfaces have greatly improved in performance and functionality as a result of the widespread usage and demand from AI/ML workloads and system administration needs. Alluxio is rapidly evolving to meet the needs of enterprises that are deploying it as a key component of their AI/ML stacks.
Founding Engineer, Alluxio
Alluxio 2.4.0 focuses on features critical to large scale, production deployments in Cloud and Hybrid Cloud environments. Enterprises leverage Alluxio at enormous scale in many dimensions, including number of files, total volume of data, requests per second, and number of concurrent clients.
Alluxio 2.3.0 focuses on streamlining the user experience in hybrid cloud deployments where Alluxio is deployed with compute in the cloud to access data on-prem. Features such as environment validation tools and concurrent metadata synchronization greatly improve Alluxio’s functionality. Integrations with AWS EMR, Google Dataproc, K8s, and AWS Glue make Alluxio easy to use in a variety of cloud environments. In this article, we will share some of the highlights of the release. For more, please visit our release notes page.
With this release comes the General Availability (GA) of Alluxio Structured Data Services (SDS), the subsystem of Alluxio responsible for managing and transforming structured data, such as databases, tables, and partitions.
Notice anything new about our websites? That’s right – we are super excited to launch our new website – Alluxio.io!
As we continue our focus on our open source community, one important item on our mind was to rebuild our website to provide better user experience for our community. To that end, you’ll see lots of changes in the Alluxio web experience.
In a recent blog, we discussed the ideation, design and new features in Alluxio 2.0 preview. Today we are thrilled to announce another new revolutionary project that the Alluxio engineering team has been hard at work on for the past year – the Alluxio Virtual Reality (VR) client.
Apache Spark has brought significant innovation to Big Data computing, but its results are even more extraordinary when paired with Alluxio. Alluxio, provides Spark with a reliable data sharing layer, enabling Spark to excel at performing application logic while Alluxio handles storage. Bazaarvoice uses the combination of Spark and Alluxio to provide a real time big data platform that has the ability to not only handle the intake of 1.5 billion page views during peak events like Black Friday, but also provide real time analytics against it (read more). At this scale, the gain in speed is an enabler for new workloads. We’ve established a clean and simple way to integrate Alluxio and Spark.
We are thrilled and excited to announce the availability of Alluxio 2.0 Preview Release – the largest open source release with the most new features and improvements since the creation of the project. It is now available for download.
While Alluxio already enabled data locality and data accessibility for many big data workloads in the cloud, there was still innovation needed in key areas.
An Alluxio cluster caches data from connected storage systems in memory to create a data layer that can be accessed concurrently by multiple application frameworks. This greatly improves performance for many analytics workloads. On-demand caching occurs when clients read blocks of data using a ‘CACHE’ read type from persistent storage systems connected to the Alluxio cluster.
Prior to Alluxio v1.7, on-demand caching was on the critical path of read operations, requiring a full block to be read before the data was available for the application. Workloads which read partial blocks, for example SQL workloads, would be adversely affected on initial reads from connected storage.