This talk describes a stack of open-source projects to serve high-concurrent and low-latency SQL queries using Presto with Alluxio on big data in the cloud. Deploying Alluxio as a data orchestration layer to access cloud storage object storage (e.g., AWS S3), this architecture greatly enhances the data locality of Presto with distributed and cross-query caching, thus avoids reading the same data repeatedly from the cloud storage.
Tag: alluxio engineering
This article goes through a simple example to illustrate how Structured Data Management available in the latest Alluxio 2.1.0 release to help SQL and structured data workloads.
This article introduces Structured Data Management (Developer Preview) available in the latest Alluxio 2.1.0 release, a new effort to provide further benefits to SQL and structured data workloads using Alluxio.
For today’s blog post I interviewed Bin Fan, Founding Engineer and VP of Open Source at Alluxio. Bin is the PMC maintainer of the Alluxio open source project. Prior to Alluxio, he worked for Google on the next-generation storage infrastructure.
This tutorial describes steps to set up an EMR cluster with Alluxio as a distributed caching layer for Hive, and run sample queries to access data in S3 through Alluxio.
Learn more about Bazaarvoice’s use case leveraging Apache Spark, Hive, and Alluxio on S3. Along with how to set up Hive with Alluxio so that Hive jobs can seamlessly read from/write to S3.
Bay Area Meetup which include presentations on the architecture of Presto, its separation of compute and storage, cloud-readiness, recent advancements in the project such as Cost-Based Optimizer and Kubernetes Support. Presto and Alluxio production use cases and more.
Grafana, a comprehensive metrics visualization software, ties into this process by pulling the metrics that systems like Alluxio collect through a sink and visualizes them in a more helpful fashion. This guide will cover how to set up Grafana and Graphite, a supported sink for Alluxio that will put metrics in a time-series database, along with exploring some of the possibilities that the combination offers.
Alluxio is an open-source data orchestration system widely used to speed up data-intensive workloads in the cloud. Alluxio v2.0 introduced Replicated Async Write to allow users to complete writes to Alluxio file system and return quickly with high application performance, while still providing users with peace of mind that data will be persisted to the chosen under storage like S3 in the background.