Accelerating Data Computation on Ceph Objects using Alluxio

In this talk, we will present how using Alluxio computation and storage ecosystems can better interact benefiting of the “bringing the data close to the code” approach. Moving away from the complete disaggregation of computation and storage, data locality can enhance the computation performance. During this talk, we will present our observations and testing results that will show important enhancements in accelerating Spark Data Analytics on Ceph Objects Storage using Alluxio.

Tags: , , , , ,

Speeding Up Spark Performance using Alluxio at China Unicom

Unicom’s traditional batch architecture consists mainly of IOE, Hive, and Greenplum systems. With the development of business, a large number of computing application modules based on diverse scenarios, chimney-like, decentralized applications have emerged. To solve the problem of resource fragmentation, we have introduced a unified computing platform for computing ecology with Spark and Alluxio as the core. Alluxio plays an important role in accelerating data processing and ensuring process stability.

Tags: , ,

Adopting Satellite Clusters with Alluxio at Vipshop to Improve Spark Jobs for Targeted Advertising by 30x

As the third largest e-commerce site in China, Vipshop processes large amounts of data collected daily to generate targeted advertisements for its consumers. In this article, Gang Deng from Vipshop describes how to meet SLAs by improving struggling Spark jobs on HDFS by up to 30x, and optimize hot data access with Alluxio to create … Continued

Bursting Spark or Presto Jobs to AWS using Alluxio

In this office hour, we demonstrate how a “zero-copy burst” solution helps to speed up Spark and Presto queries in the public cloud while eliminating the process of manually copying and synchronizing data from the on-premise data lake to cloud storage. This approach allows compute frameworks to decouple from on-premise data sources and scale efficiently by leveraging Alluxio and public cloud resources such as AWS.

Tags: , , , , , , , , ,

Bursting Spark or Presto Jobs to AWS using Alluxio

Community Online Office Hour *

In this office hour, we demonstrate how a “zero-copy burst” solution helps to speed up Spark and Presto queries in the public cloud while eliminating the process of manually copying and synchronizing data from the on-premise data lake to cloud storage. This approach allows compute frameworks to decouple from on-premise data sources and scale efficiently by leveraging Alluxio and public cloud resources such as AWS.