Case Studies Archives

Speed up Large-scale ML/DL Offline Inference Jobs with Alluxio at Microsoft Bing

January 6, 2022 By Binyang Li and Qianxi Zhang

Running inference at scale is challenging. In this blog, we will share our observations and the practice to use Alluxio to speed up the I/O performance for large-scale ML/DL offline inference at Microsoft Bing.

Speeding Up the Atlas Supercomputing Platform with Fluid + Alluxio

November 8, 2021 By Dongdong Lv and Qingsong Liu

Unisound is an artificial intelligence company focusing on Internet of Things services. Unisound’s AI technology stacks include the perception and expression capabilities of signals, voices, images, and texts, and the cognitive technologies such as knowledge, understanding, analysis, and decision-making, towards a multi-modal AI system. Atlas is the supercomputing platform supporting all kinds of AI applications including model training and reasoning inferencing.

Bursting Your On-Premises Data Lake Analytics and AI Workloads on AWS

January 26, 2021 By Adit Madan

This post outlines a solution for building a hybrid data lake with Alluxio to leverage analytics and AI on Amazon Web Services (AWS) alongside a multi-petabyte on-premises data lake. Alluxio’s solution is called “zero-copy” hybrid cloud, indicating a cloud migration approach without first copying data to Amazon Simple Storage Service (Amazon S3).

Building High-Performance Data Lake Using Apache Hudi and Alluxio at T3Go

November 20, 2020 By Trevor Zhang (T3Go), Vino Yang (T3Go), Jasmine Wang and Bin Fan

How T3Go’s high-performance data lake using Apache Hudi and Alluxio shortened the time for data ingestion into the lake by up to a factor of 2. Data analysts using Presto, Hudi, and Alluxio in conjunction to query data on the lake saw queries speed up by 10 times faster.

Building a high-performance platform on AWS to support real-time gaming services using Presto and Alluxio

August 4, 2020 By Teng Wang (Electronic Arts), Du Li (Electronic Arts), Yu Jin (Electronic Arts) and Sundeep Narravula (Electronic Arts)

This blog explores an innovative platform with Presto as the computing engine and Alluxio as a data orchestration layer between Presto and S3 storage, to support online services with instantaneous response within the gaming industry. The preliminary results show that Presto with Alluxio outperforms S3 significantly in all cases.Alluxio with metadata caching shows up to 5.9x performance gain when handling large numbers of small files.

Reducing Large S3 API Costs Using Alluxio

July 30, 2020 By Juraj Pohanka (datasapiens), Koen Michiels (datasapiens) and Sam Gilbert (datasapiens)

This article described how engineers at datasapiens brought down S3 API costs by 200x by implementing Alluxio as a data orchestration layer between S3 and Presto.

Adopting Satellite Clusters with Alluxio at Vipshop to Improve Spark Jobs for Targeted Advertising by 30x

July 25, 2020 By Gang Deng (Vipshop) and Jasmine Wang

As the third largest e-commerce site in China, Vipshop processes large amounts of data collected daily to generate targeted advertisements for its consumers. In this article, Gang Deng from Vipshop describes how to meet SLAs by improving struggling Spark jobs on HDFS by up to 30x, and optimize hot data access with Alluxio to create … Continued

Building a Cross-Region Hybrid Cloud Storage Gateway for Machine Learning & AI at WeRide

July 8, 2020 By Derek Tan (WeRide) and Jasmine Wang

In this blog, Derek Tan, Executive Director of Infra & Simulation at WeRide, describes how engineers leverage Alluxio as a hybrid cloud data gateway for applications on-premises to access public cloud storage like AWS S3.

Category: Case Studies