A collaboration of Alibaba, Alluxio, and Nanjing University in tackling the problems of Deep Learning model training in the cloud. Our goal was to reduce the cost and complexity of data access for Deep Learning training in a hybrid environment, which resulted in over 40% reduction in training time and cost.
Alluxio Resources
Find our rich collection of White Papers, Case Studies, Presentations, and Videos here.




This article describes how Alluxio can accelerate the training of deep learning models in a hybrid cloud environment when using Intel’s Analytics Zoo open source platform, powered by oneAPI. Details on the new architecture and workflow, as well as Alluxio’s performance benefits and benchmarks results will be discussed.
Are you using SQL engines, such as Presto, to query existing Hive data warehouse and experiencing challenges including overloaded Hive Metastore with slow and unpredictable access, unoptimized data formats and layouts such as too many small files, or lack of influence over the existing Hive system and other Hive applications?
Ideally, Presto would access data independently from how the data was originally stored or managed. Alluxio, as a data orchestration layer provides the physical … Continued
Accessing data to run analytic workloads in Spark across data centers and/or clouds can be challenging. Additionally, network I/O can bottleneck Spark jobs that … Continued
Building distributed systems is no small feat. Software testing is just one of many critical practices that engineers who build these systems need to … Continued
Many organizations are leveraging EMR to run big data analytics on public cloud. However, reading and writing data to S3 directly can result in … Continued
This talk will overview two projects at Electronic Arts (EA) that address the mismatch by data orchestration: One project automatically generates configurations for all … Continued
Alluxio 2.0 release was the biggest update since the birth of the project “Tachyon” from UC Berkley’s AmpLab. Gathering feedback from our Open Source … Continued
For data-driven workloads in disaggregated stacks, there’s no native data access layer within a Kubernetescluster. For query engines and machine learning frameworks that are … Continued
This webinar will describe the concept and internal mechanism using the stack of Spark+Alluxio in Kubernetes to enhance data locality even when the storage … Continued
Join us for this tech talk where we’ll introduce the Starburst Presto, Alluxio, and cloud object store stack for building a highly-concurrent and low-latency … Continued