In the on-prem days, one key performance optimization for Apache Hadoop or Apache Spark workloads is to run tasks on nodes with local HDFS data. However, while adoption of the Cloud & Kubernetes makes scaling compute workloads exceptionally easy, HDFS is often not an option. Effectively accessing data from cloud-native storage services like AWS S3 or even on-premises HDFS becomes harder as data locality is lost.
Join us for this tech talk where we’ll introduce the Starburst Presto, Alluxio, and cloud object store stack for building a highly-concurrent and low-latency analytics platform.
Learn how to set up Google Cloud Dataproc with Alluxio so jobs can seamlessly read from and write to Cloud Storage. See how to run Dataproc Spark against a remote HDFS cluster.
In this tech talk, we’ll discuss why DBS turned to Alluxio’s bursting approach to help solve on-prem compute capacity challenges.
This tech talk will share approaches to burst data to the cloud along with
how Alluxio can enable “zero-copy” bursting of Spark workloads to cloud data services like EMR and Dataproc. Learn how DBS bank uses Alluxio to solve for limited on-prem compute capacity.
In this webinar, Adit will present this new approach of bringing data locality to data-intensive compute workloads in Kubernetes environments, and demo how to setup and run Apache Spark and Alluxio in Kubernetes.
How to set up EMR Spark and Hive with Alluxio so jobs can seamlessly read from and write to your S3 data lake.
This tech talk gives a quick overview of Alluxio and the use cases it powers for Spark/Presto in Kubernetes. We also show you how to set up Alluxio and Spark/Presto to run in Kubernetes.
We will introduce the key new features and enhancements such as: Support for hyper-scale data workloads, Machine learning and deep learning workloads, and Better storage abstraction.