Adopting Satellite Clusters with Alluxio at Vipshop to Improve Spark Jobs for Targeted Advertising by 30x

As the third largest e-commerce site in China, Vipshop processes large amounts of data collected daily to generate targeted advertisements for its consumers. In this article, Gang Deng from Vipshop describes how to meet SLAs by improving struggling Spark jobs on HDFS by up to 30x, and optimize hot data access with Alluxio to create … Continued

Running Presto/Trino in a Hybrid Cloud Architecture

Migrating SQL workloads from a fully on-premise environment to cloud infrastructure has numerous benefits, including alleviating resource contention and reducing costs by paying for computation resources on an on-demand basis. In the case of Presto running on data stored in HDFS, the separation of compute in the cloud and storage on-premises is apparent since Presto’s … Continued

Introducing Alluxio 2.3

Alluxio 2.3.0 focuses on streamlining the user experience in hybrid cloud deployments where Alluxio is deployed with compute in the cloud to access data on-prem. Features such as environment validation tools and concurrent metadata synchronization greatly improve Alluxio’s functionality. Integrations with AWS EMR, Google Dataproc, K8s, and AWS Glue make Alluxio easy to use in a variety of cloud environments. In this article, we will share some of the highlights of the release. For more, please visit our release notes page.

Accelerating and Scaling Big Data Analytics with Alluxio and Intel® Optane™ Persistent Memory

International Data Corporation (IDC) reported that the global datasphere will grow from 33 zettabytes in 2018 to 175 zettabytes by 2025. This trend becomes more and more complicated with the variety and velocity of data growth, and it continuously changes the ways how data is collected, stored, processed and analyzed. New analytics solutions from machine … Continued