This whitepaper introduces how to speed up end-to-end distributed training in the cloud using Alluxio to accelerate data access. With the help of Alluxio, loading data from cloud storage, training and caching data can be done in a transparent and distributed way as a part of the training process. This whitepaper also demonstrates how to set up and benchmark the end-to-end performance of the training process, along with a comparison of other popular approaches.
Many companies have leveraged Alluxio to level up their current Presto platform, including Facebook, TikTok, Electronic Arts, Walmart, Tencent, Comcast, and more. They have gained significant benefits with Alluxio integrated into their Presto stack.
Alluxio is the data orchestration platform to unify data silos across heterogeneous environments. The following blog will discuss the architecture combining Spark with Alluxio.
Alluxio started as a virtual distributed file system, a research project out of the AMPLab at U.C. Berkeley. Alluxio foresaw the need for agility when accessing large data stores separated from compute engines like Hadoop or Spark.
Fast forward several years and over a thousand committers later, and Alluxio has blossomed into the industry’s leading data orchestration platform for analytics and AI/ML. But as with any new type of technology, figuring out the best ways to use it depends on your data environment, computational workloads, issues, and goals.
This blog is the first in a series introducing Alluxio as the data platform to unify data silos across heterogeneous environments. The next blog will include insights from PrestoDB committer Beinan Wang to uncover the value for analytics use cases, specifically with PrestoDB as the compute engine.
Alluxio’s capabilities as a Data Orchestration framework have encouraged users to onboard more of their data-driven applications to an Alluxio powered data access layer. Driven by strong interests from our open-source community, the core team of Alluxio started to re-design an efficient and transparent way for users to leverage data orchestration through the POSIX interface.
Alluxio is a leading data orchestration platform that offers a compute agnostic, storage agnostic, and cloud agnostic solution for big data and machine learning applications. Aunalytics is a data platform company delivering Insights-as-a-Service to answer enterprise and mid-sized companies’ most important IT and business questions.