Optimizing Query Performance by Decoupling Presto and Hive Data Warehouse

Ideally, Presto would access data independently from how the data was originally stored or managed. Alluxio, as a data orchestration layer provides the physical data independence, for Presto to interact with the data more efficiently. In addition to caching for IO acceleration, Alluxio also provides a catalog service to abstract the metadata in the Hive Metastore, and transformations to expose the data in compute-optimized way. In this talk, we describe some of the challenges of using Presto with Hive, and introduce Alluxio data orchestration for solving those challenges.

Tags: , , , , , , ,

Burst Presto & Spark workloads to AWS EMR with no data copies

Community Online Office Hour *

In this talk, we will show you how to leverage any public cloud (AWS, Google Cloud Platform, or Microsoft Azure) to scale analytics workloads directly on on-prem data without copying and synchronizing the data into the cloud.

Optimizing Query Performance by Decoupling Presto and Hive Data Warehouse

Community Online Office Hour *

Alluxio, as a data orchestration layer provides the physical data independence, for Presto to interact with the data more efficiently. In addition to caching for IO acceleration, Alluxio also provides a catalog service to abstract the metadata in the Hive Metastore, and transformations to expose the data in compute-optimized way. In this talk, we describe some of the challenges of using Presto with Hive, and introduce Alluxio data orchestration for solving those challenges.

Tech Talk: Accelerating analytics with EMR on your S3 data lake

EMR has become a widely used service to run big data analytics in the public cloud. But issues around slow/inconsistent EMR performance due to S3 data lakes creates challenges for organizations.

Alluxio is a data orchestration layer for the cloud that increases performance of analytic workloads running on AWS EMR using S3 as the storage. 

Join us for this webinar where we will show you how to set up EMR Spark and Hive with Alluxio so jobs can seamlessly read from and write to your S3 data lake. You’ll see the performance gains with Alluxio in your EMR/S3 stack.

Tags: , , , , ,

Tech Talk: Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage

The ever increasing challenge to process and extract value from exploding data with AI and analytics workloads makes a memory centric architecture with disaggregated storage and compute more attractive. This decoupled architecture enables users to innovate faster and scale on-demand. Enterprises are also increasingly looking towards object stores to power their big data & machine learning workloads in a cost-effective way. However, object stores don’t provide big data compatible APIs as well as the required performance. 

In this webinar, the Intel and Alluxio teams will present a proposed reference architecture using Alluxio as the in-memory accelerator for object stores to enable modern analytical workloads such as Spark, Presto, Tensorflow, and Hive. We will also present a technical overview of Alluxio.

Tags: , , , , , , ,