Google’s TensorFlow and Facebook’s PyTorch are two Deep Learning frameworks that have been popular with the open source community. Although PyTorch is still a relatively new framework, many developers have successfully adopted it due to its ease of use. By default, PyTorch does not support Deep Learning model training directly in HDFS, which brings challenges … Continued
Tag: deep learning
A collaboration of Alibaba, Alluxio, and Nanjing University in tackling the problems of Deep Learning model training in the cloud. Our goal was to reduce the cost and complexity of data access for Deep Learning training in a hybrid environment, which resulted in over 40% reduction in training time and cost.
This article presents the collaborative work of Alibaba, Alluxio, and Nanjing University in tackling the problem of Artificial Intelligence and Deep Learning model training in the cloud. We adopted a hybrid solution with a data orchestration layer that connects private data centers to cloud platforms in a containerized environment. Various performance bottlenecks are analyzed with detailed optimizations of each component in the architecture.