From limited Hadoop compute capacity to increased data scientist efficiency

Alluxio Tech Talk *

This tech talk will share approaches to burst data to the cloud along with
how Alluxio can enable “zero-copy” bursting of Spark workloads to cloud data services like EMR and Dataproc. Learn how DBS bank uses Alluxio to solve for limited on-prem compute capacity.

Why Data Orchestration?

Large-scale analytics and AI/ML applications require efficient data access, with data increasingly distributed across multiple data stores in private data centers and clouds. Data platform teams also need the flexibility to introduce new data sources and move to new storage options with minimal changes or downtime for their applications. This paper delves further into what’s driving the need for–and what problems are solved with—an Alluxio data orchestration layer as part of a modern data platform.

Tags: , , , ,

Tech Talk: Accelerating analytics with EMR on your S3 data lake

EMR has become a widely used service to run big data analytics in the public cloud. But issues around slow/inconsistent EMR performance due to S3 data lakes creates challenges for organizations. 

Alluxio is a data orchestration layer for the cloud that increases performance of analytic workloads running on AWS EMR using S3 as the storage. 

Join us for this webinar where we will show you how to set up EMR Spark and Hive with Alluxio so jobs can seamlessly read from and write to your S3 data lake. You’ll see the performance gains with Alluxio in your EMR/S3 stack.

Tags: , , , , ,

What can I do to speed up analytics performance on remote data?

Background Today’s advanced analytics applications run on more datasets that ever before. The locations of where data “lands” is becoming more dispersed. And the separation of compute and storage in modern environments lends well to running on these distributed datasets. Data can be stored in a remote location from the compute, such as in a … Continued