analytic workloads Archives

Build a hybrid data lake and burst processing to Google Cloud Dataproc with Alluxio

Alluxio Tech Talk * May 28, 2020

Join us for this tech talk where we will show you how Alluxio can help burst your private computing environment to Google Cloud, minimizing costs and I/O overhead. Alluxio coupled with Google’s open source data and analytics processing engine, Dataproc, enables zero-copy burst for faster query performance in the cloud so you can take advantage of resources that are not local to your data, without the need for managing the copying or syncing of that data.

Burst Presto & Spark workloads to AWS EMR with no data copies

April 28, 2020

In this talk, we will show you how to leverage any public cloud (AWS, Google Cloud Platform, or Microsoft Azure) to scale analytics workloads directly on on-prem data without copying and synchronizing the data into the cloud.

Tags: analytic workloads, cloud, hdfs, hybrid cloud, office hour, presto, public cloud, spark

Burst Presto & Spark workloads to AWS EMR with no data copies

Community Online Office Hour * April 28, 2020

Bursting Apache Spark Workloads to the Cloud on Remote Data

Community Online Office Hour * March 10, 2020

Accessing data to run analytic workloads in Spark across data centers and/or clouds can be challenging. Additionally, network I/O can bottleneck Spark jobs that need to read a large amount of data. A common solution is to deploy an HDFS cluster closer to Spark as a caching layer and manually copy the input data to HDFS first, purging it afterward. But this ETL process can be both time-consuming and also error-prone.

Tag: analytic workloads