Adit Madan and Parviz Peiravi offer an overview of the Alluxio data orchestration layer that provides a unified data access layer for hybrid and multi cloud deployments, leveraging Intel® Optane™ Persistent Memory for higher performance caching at reduced cost. The data access layer enables distributed compute engines like Presto, TensorFlow, and PyTorch to transparently access data from various storage systems (including S3, HDFS, and Azure) while actively leveraging a multi-tier cache to accelerate data access.
Join us for this tech talk where we will show you how Alluxio can help burst your private computing environment to Google Cloud, minimizing costs and I/O overhead. Alluxio coupled with Google’s open source data and analytics processing engine, Dataproc, enables zero-copy burst for faster query performance in the cloud so you can take advantage of resources that are not local to your data, without the need for managing the copying or syncing of that data.
In the on-prem days, one key performance optimization for Apache Hadoop or Apache Spark workloads is to run tasks on nodes with local HDFS data. However, while adoption of the Cloud & Kubernetes makes scaling compute workloads exceptionally easy, HDFS is often not an option. Effectively accessing data from cloud-native storage services like AWS S3 or even on-premises HDFS becomes harder as data locality is lost.
Join us for this tech talk where we’ll introduce the Starburst Presto, Alluxio, and cloud object store stack for building a highly-concurrent and low-latency analytics platform.
Learn how to set up Google Cloud Dataproc with Alluxio so jobs can seamlessly read from and write to Cloud Storage. See how to run Dataproc Spark against a remote HDFS cluster.
In this tech talk, we’ll discuss why DBS turned to Alluxio’s bursting approach to help solve on-prem compute capacity challenges.
This tech talk will share approaches to burst data to the cloud along with
how Alluxio can enable “zero-copy” bursting of Spark workloads to cloud data services like EMR and Dataproc. Learn how DBS bank uses Alluxio to solve for limited on-prem compute capacity.
In this webinar, Adit will present this new approach of bringing data locality to data-intensive compute workloads in Kubernetes environments, and demo how to setup and run Apache Spark and Alluxio in Kubernetes.
How to set up EMR Spark and Hive with Alluxio so jobs can seamlessly read from and write to your S3 data lake.