Fast Big Data Analytics and Machine Learning Using Alluxio and Spark in Baidu


A few months ago, Baidu deployed Alluxio to accelerate its big data analytics workload. Bin Fan and Haojun Wang explain why Baidu chose Alluxio, as well as the details of how they achieved a 30x speedup with Alluxio in their production environment with hundreds of machines. Based on the success of the big data analytics engine, Baidu is currently expanding the Alluxio and Spark infrastructure to accelerate other applications, such as machine learning.

Bin and Haojun also delve into how they built a heterogenous computing platform to accelerate deep learning workloads. This platform consists of heterogeneous computing resources (CPU, GPU, FPGA) managed by a heterogeneous computing layer, as well as heterogeneous storage resources (memory, SSD, HDD) managed by Alluxio.

Bin Fan, VP Open Source and Founding Engineer at Alluxio
Haoujun Wang, Tech Lead at Baidu