This article describes how Alluxio can accelerate the training of deep learning models in a hybrid cloud environment when using Intel’s Analytics Zoo open source platform, powered by oneAPI. Details on the new architecture and workflow, as well as Alluxio’s performance benefits and benchmarks results will be discussed.
Tag: zero copy
This whitepaper details how to leverage a public cloud, such as Amazon AWS, Google GCP, or Microsoft Azure to scale analytic workloads directly on data on-premises without copying and synchronizing the data into the cloud. We will show an example of what it might look like to run on-demand Presto and Hive with Alluxio in the public cloud using on-prem HDFS. We will also show how to set up and execute performance benchmarks in two geographically dispersed Amazon EMR clusters along with a summary of our findings.
Alluxio, an open source data orchestration technology, helping speed up Dataproc workloads by providing a distributed caching layer in the Dataproc Cluster.
The DBS team was tasked to solve their compute capacity problem. They wanted to provide faster insights and analyze data for a range of use cases but didn’t have the ability to scale compute elastically on-prem.
One use case that challenged them was customer call analysis. With the millions of customer calls they get every year, DBS manages over 50TB of customer data and audio files. This data needed to reside on-prem for compliance reasons. With on-prem compute limitations, they looked to the public cloud to analyze this data and selected “zero-copy” bursting as the best approach.
In this tech talk, we’ll discuss why DBS turned to Alluxio’s bursting approach to help solve on-prem compute capacity challenges.
Want to leverage your existing investments in Hadoop with your data on-premise and still benefit from the elasticity of the cloud?
Like other Hadoop users, you most likely experience very large and busy Hadoop clusters, particularly when it comes to compute capacity. Bursting HDFS data to the cloud can bring challenges – network latency impacts performance, copying data via DistCP means maintaining duplicate data, and you may have to make application changes to accomodate the use of S3.
“Zero-copy” hybrid bursting with Alluxio keeps your data on-prem and syncs data to compute in the cloud so you can expand compute capacity, particularly for ephemeral Spark jobs.