compute Archives

Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds

January 27, 2022

Data platform teams are increasingly challenged with accessing multiple data stores that are separated from compute engines, such as Spark, Presto, TensorFlow or PyTorch. Whether your data is distributed across multiple datacenters and/or clouds, a successful heterogeneous data platform requires efficient data access. Alluxio enables you to embrace the separation of storage from compute and use Alluxio data orchestration to simplify adoption of the data lake and data mesh paradigms for analytics and AI/ML workloads.

Tags: ai, analytics, cloud, compute, data orchestration, data platform, data stores, ml, storage

Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds

Alluxio Product School * January 27, 2022

Whether your data is distributed across multiple datacenters and/or clouds, a successful heterogeneous data platform requires efficient data access. Alluxio enables you to embrace the separation of storage from compute and use Alluxio data orchestration to simplify adoption of the data lake and data mesh paradigms for analytics and AI/ML workloads.

Accelerating Data Computation on Ceph Objects using Alluxio

December 13, 2020

In this talk, we will present how using Alluxio computation and storage ecosystems can better interact benefiting of the “bringing the data close to the code” approach. Moving away from the complete disaggregation of computation and storage, data locality can enhance the computation performance. During this talk, we will present our observations and testing results that will show important enhancements in accelerating Spark Data Analytics on Ceph Objects Storage using Alluxio.

Tags: ceph, compute, data orchestration, data orchestration summit, spark, storage

Accelerating Data Computation on Ceph Objects using Alluxio

November 11, 2020

In this talk, we will present how using Alluxio computation and storage ecosystems can better interact benefiting the “bringing the data close to the code” approach. Moving away from the complete disaggregation of computation and storage, data locality can enhance the computation performance. During this talk, we will present our observations and testing results that will show important enhancements in accelerating Spark Data Analytics on Ceph Objects Storage using Alluxio.

Tags: ceph, compute, data locality, distributed storage, meetup, object storage, storage

Bursting Spark or Presto Jobs to AWS using Alluxio

June 23, 2020

In this office hour, we demonstrate how a “zero-copy burst” solution helps to speed up Spark and Presto queries in the public cloud while eliminating the process of manually copying and synchronizing data from the on-premise data lake to cloud storage. This approach allows compute frameworks to decouple from on-premise data sources and scale efficiently by leveraging Alluxio and public cloud resources such as AWS.

Tags: aws, cloud storage, compute, hdfs, hybrid cloud, office hour, performance, presto, spark, zero copy bursting

Bursting Spark or Presto Jobs to AWS using Alluxio

Community Online Office Hour * June 23, 2020

Tag: compute