A collaboration of Alibaba, Alluxio, and Nanjing University in tackling the problems of Deep Learning model training in the cloud. Our goal was to reduce the cost and complexity of data access for Deep Learning training in a hybrid environment, which resulted in over 40% reduction in training time and cost.
Find our rich collection of White Papers, Case Studies, Presentations, and Videos here.
This article describes how Alluxio can accelerate the training of deep learning models in a hybrid cloud environment when using Intel’s Analytics Zoo open source platform, powered by oneAPI. Details on the new architecture and workflow, as well as Alluxio’s performance benefits and benchmarks results will be discussed.
Are you using SQL engines, such as Presto, to query existing Hive data warehouse and experiencing challenges including overloaded Hive Metastore with slow and unpredictable access, unoptimized data formats and layouts such as too many small files, or lack of influence over the existing Hive system and other Hive applications?
Applications like Tensorflow, PyTorch can access data through Alluxio FUSE service without modifying any code just like accessing their local file systems by Unix/Linux … Continued
Alluxio’s distributed systems experts explore today’s data access challenges and open source data orchestration solutions for modernizing your data platform. … Continued
Alluxio is a leading data orchestration platform that offers a compute agnostic, storage agnostic, and cloud agnostic solution for big data and machine learning … Continued
Alluxio has an excellent metrics system and supports various kinds of metrics, e.g. an embedded JSON sink and the prometheus sink. Users and developers … Continued
Nowadays it is not straightforward to integrate Alluxio with popular query engines like Presto on existing Hive data. Solutions proposed by the community like … Continued
RaptorX is an internal project name aiming to boost query latency significantly beyond what vanilla Presto is capable of. For this session, we introduce … Continued
Today’s analytics workloads demand real-time access to expansive amounts of data. This session demonstrates how Alluxio’s data orchestration platform, running on Intel Optane persistent … Continued
RAPIDS is a set of open source libraries enabling GPU aware scheduling and memory representation for analytics and AI. Spark 3.0 uses RAPIDS for … Continued