A collaboration of Alibaba, Alluxio, and Nanjing University in tackling the problems of Deep Learning model training in the cloud. Our goal was to reduce the cost and complexity of data access for Deep Learning training in a hybrid environment, which resulted in over 40% reduction in training time and cost.
This article presents the collaborative work of Alibaba, Alluxio, and Nanjing University in tackling the problem of Artificial Intelligence and Deep Learning model training in the cloud. We adopted a hybrid solution with a data orchestration layer that connects private data centers to cloud platforms in a containerized environment. Various performance bottlenecks are analyzed with detailed optimizations of each component in the architecture.
For data-driven workloads in disaggregated stacks, there’s no native data access layer within a Kubernetescluster. For query engines and machine learning frameworks that are deployed within a Kubernetes cluster, any critical data sitting outside the cluster breaks locality. Alluxio can help.
This webinar will describe the concept and internal mechanism using the stack of Spark+Alluxio in Kubernetes to enhance data locality even when the storage service is outside or remote.
In the on-prem days, one key performance optimization for Apache Hadoop or Apache Spark workloads is to run tasks on nodes with local HDFS data. However, while adoption of the Cloud & Kubernetes makes scaling compute workloads exceptionally easy, HDFS is often not an option. Effectively accessing data from cloud-native storage services like AWS S3 or even on-premises HDFS becomes harder as data locality is lost.
This office hour describes the concept and dataflow with respect to using the stack of Spark/Alluxio in Kubernetes with enhanced data locality even the storage service is outside or remote.
One important performance optimization in Apache Spark is to schedule tasks on nodes with HDFS data nodes locally serving the task input data. However, more users are running Apache Spark natively on Kubernetes where HDFS is not an option. This office hour describes the concept and dataflow with respect to using the stack of Spark/Alluxio in Kubernetes with enhanced data locality even the storage service is outside or remote.
Learn about Alibaba’s use case in deep learning and gene computing acceleration using Alluxio in Kubernetes.
Kubernetes is widely used across enterprises to orchestrate computation. And while Kubernetes helps improve flexibility and portability for computation in public/hybrid cloud environments across infrastructure providers, running data-intensive workloads can be challenging.
When it comes to efficiently moving data closer to Spark or Presto frameworks, co-locating data with these frameworks and accessing data from multiple or remote clouds is hard to do. That’s where Alluxio, an open source data orchestration platform, can help.
Alluxio enables data locality with your Spark and Presto workloads for faster performance and better data accessibility in Kubernetes. It also provides portability across storage providers.
In this on demand tech talk we’ll give a quick overview of Alluxio and the use cases it powers for Spark/Presto in Kubernetes. We’ll show you how to set up Alluxio and Spark/Presto to run in Kubernetes as well.