Alluxio 2.3 was just released at the end of June 2020. Calvin and Bin will go over the new features and integrations available and share learnings from the community. Any questions about the release and on-going community feature development are welcome.
Tag: alluxio engineering
Alluxio (alluxio.io) is an open-source data orchestration system that provides a single namespace federating multiple external distributed storage systems. It is critical for Alluxio to be able to store and serve the metadata of all files and directories from all mounted external storage both at scale and at speed.
This talk shares our design, implementation, and optimization of Alluxio metadata service (master node) to address the scalability challenges.
Today, many people run deep learning applications with training data from separate storage such as object storage or remote data centers. This presentation will demo the Intel Analytics Zoo + Alluxio stack, an architecture that enables high performance while keeping cost and resource efficiency balanced without network being I/O bottlenecked.
For many latency-sensitive SQL workloads, Presto is often bound by retrieving distant data. In this talk, Rohit Jain, James Sun from Facebook and Bin Fan from Alluxio will introduce their teams’ collaboration on adding a local on-SSD Alluxio cache inside Presto workers at Facebook to improve queries with unsatisfied latency.
Ideally, Presto would access data independently from how the data was originally stored or managed. Alluxio, as a data orchestration layer provides the physical data independence, for Presto to interact with the data more efficiently. In addition to caching for IO acceleration, Alluxio also provides a catalog service to abstract the metadata in the Hive Metastore, and transformations to expose the data in compute-optimized way. In this talk, we describe some of the challenges of using Presto with Hive, and introduce Alluxio data orchestration for solving those challenges.
This article goes through a simple example to illustrate how Structured Data Management available in the latest Alluxio 2.2.0 release to help SQL and structured data workloads.
This article introduces Structured Data Management available in the latest Alluxio 2.2.0 release, a new effort to provide further benefits to SQL and structured data workloads using Alluxio.
With this release comes the General Availability (GA) of Alluxio Structured Data Services (SDS), the subsystem of Alluxio responsible for managing and transforming structured data, such as databases, tables, and partitions.
This talk describes a stack of open-source projects to serve high-concurrent and low-latency SQL queries using Presto with Alluxio on big data in the cloud. Deploying Alluxio as a data orchestration layer to access cloud storage object storage (e.g., AWS S3), this architecture greatly enhances the data locality of Presto with distributed and cross-query caching, thus avoids reading the same data repeatedly from the cloud storage.