CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes Using Alluxio

CNCF Webinar *

In the on-prem days, one key performance optimization for Apache Hadoop or Apache Spark workloads is to run tasks on nodes with local HDFS data. However, while adoption of the Cloud & Kubernetes makes scaling compute workloads exceptionally easy, HDFS is often not an option. Effectively accessing data from cloud-native storage services like AWS S3 or even on-premises HDFS becomes harder as data locality is lost.

Improving Data Locality for Spark Jobs on Kubernetes Using Alluxio

Alluxio Community Office Hour *

One important performance optimization in Apache Spark is to schedule tasks on nodes with HDFS data nodes locally serving the task input data. However, more users are running Apache Spark natively on Kubernetes where HDFS is not an option. This office hour describes the concept and dataflow with respect to using the stack of Spark/Alluxio in Kubernetes with enhanced data locality even the storage service is outside or remote.

Improving Memory Utilization of Spark Jobs Using Alluxio

Alluxio Community Office Hour *

This office hour shares a demo and compares two approaches, caching data directly in-memory into the Spark JVM versus storing data off-heap via an in-memory storage service like Alluxio

Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds

ODSC WEST 2019 Cloud storage brings great flexibility in management and cost-efficiency to data scientists, but also introduces new challenges related to data accessibility and data locality for machine learning applications. For instance, when the input data is stored in a remote cloud storage like AWS S3 or Azure blob storage, direct data access is … Continued

Tags: , , , , , , ,

Improving Spark Memory Resource with Off-Heap In-Memory Storage

In the previous tutorial ”Getting Started with Spark Caching using Alluxio in 5 Minutes”, we demonstrated how to get started with Spark and Alluxio. To share more thoughts and experiments on how Alluxio enhances Spark workloads, this article focuses on how Alluxio helps to optimize the memory utilization of Spark applications.  For users who are … Continued