Using Alluxio to Optimize and Improve Performance of Kubernetes-Based Deep Learning in the Cloud

Tags: , , , , ,

featuring Alibaba Cloud Container Service Team Case Study

This article presents the collaborative work of Alibaba, Alluxio, and Nanjing University in tackling the problem of Artificial Intelligence and Deep Learning model training in the cloud. We adopted a hybrid solution with a data orchestration layer that connects private data centers to cloud platforms in a containerized environment. Various performance bottlenecks are analyzed with detailed optimizations of each component in the architecture. Our goal for this article is to reduce the cost and complexity of data access for Deep Learning training in a hybrid environment in order to advance Deep Learning model training in the cloud.