Bursting Spark or Presto Jobs to AWS using Alluxio

Community Online Office Hour *

In this office hour, we demonstrate how a “zero-copy burst” solution helps to speed up Spark and Presto queries in the public cloud while eliminating the process of manually copying and synchronizing data from the on-premise data lake to cloud storage. This approach allows compute frameworks to decouple from on-premise data sources and scale efficiently by leveraging Alluxio and public cloud resources such as AWS.

Accelerate and Scale Big Data Analytics with Alluxio and Intel® Optane™ Persistent Memory

International Data Corporation (IDC) reported that the global datasphere will grow from 33 zettabytes in 2018 to 175 zettabytes by 20251. This trend becomes more and more complicated with the variety and velocity of data growth, and it continuously changes the ways data is collected, stored, processed, and analyzed. New analytics solutions, including machine learning, deep learning, and artificial intelligence (AI), and new architectures and tools are being developed to extract and deliver value from the huge datasphere.

Tags: , , , , , ,

Build a hybrid data lake and burst processing to Google Cloud Dataproc with Alluxio

Alluxio Tech Talk *

Join us for this tech talk where we will show you how Alluxio can help burst your private computing environment to Google Cloud, minimizing costs and I/O overhead. Alluxio coupled with Google’s open source data and analytics processing engine, Dataproc, enables zero-copy burst for faster query performance in the cloud so you can take advantage of resources that are not local to your data, without the need for managing the copying or syncing of that data.

Optimizing Query Performance by Decoupling Presto and Hive Data Warehouse

Ideally, Presto would access data independently from how the data was originally stored or managed. Alluxio, as a data orchestration layer provides the physical data independence, for Presto to interact with the data more efficiently. In addition to caching for IO acceleration, Alluxio also provides a catalog service to abstract the metadata in the Hive Metastore, and transformations to expose the data in compute-optimized way. In this talk, we describe some of the challenges of using Presto with Hive, and introduce Alluxio data orchestration for solving those challenges.

Tags: , , , , , , ,