For data-driven workloads in disaggregated stacks, there’s no native data access layer within a Kubernetescluster. For query engines and machine learning frameworks that are deployed within a Kubernetes cluster, any critical data sitting outside the cluster breaks locality. Alluxio can help.
This webinar will describe the concept and internal mechanism using the stack of Spark+Alluxio in Kubernetes to enhance data locality even when the storage service is outside or remote.
In the on-prem days, one key performance optimization for Apache Hadoop or Apache Spark workloads is to run tasks on nodes with local HDFS data. However, while adoption of the Cloud & Kubernetes makes scaling compute workloads exceptionally easy, HDFS is often not an option. Effectively accessing data from cloud-native storage services like AWS S3 or even on-premises HDFS becomes harder as data locality is lost.
This office hour describes the concept and dataflow with respect to using the stack of Spark/Alluxio in Kubernetes with enhanced data locality even the storage service is outside or remote.
One important performance optimization in Apache Spark is to schedule tasks on nodes with HDFS data nodes locally serving the task input data. However, more users are running Apache Spark natively on Kubernetes where HDFS is not an option. This office hour describes the concept and dataflow with respect to using the stack of Spark/Alluxio in Kubernetes with enhanced data locality even the storage service is outside or remote.
Learn about Alibaba’s use case in deep learning and gene computing acceleration using Alluxio in Kubernetes.
Kubernetes is widely used across enterprises to orchestrate computation. And while Kubernetes helps improve flexibility and portability for computation in public/hybrid cloud environments across infrastructure providers, running data-intensive workloads can be challenging.
When it comes to efficiently moving data closer to Spark or Presto frameworks, co-locating data with these frameworks and accessing data from multiple or remote clouds is hard to do. That’s where Alluxio, an open source data orchestration platform, can help.
Alluxio enables data locality with your Spark and Presto workloads for faster performance and better data accessibility in Kubernetes. It also provides portability across storage providers.
In this on demand tech talk we’ll give a quick overview of Alluxio and the use cases it powers for Spark/Presto in Kubernetes. We’ll show you how to set up Alluxio and Spark/Presto to run in Kubernetes as well.
This tech talk gives a quick overview of Alluxio and the use cases it powers for Spark/Presto in Kubernetes. We also show you how to set up Alluxio and Spark/Presto to run in Kubernetes.
Alluxio is available via Docker. You can create a cluster of Alluxio within a Kubernetes cluster. Given that we do have these containers, you can either use a daemon set or a replica set within a Kubernetes cluster to create an alluxio cluster itself and have it co-located within your other nodes that may be … Continued