Tech Talk Slide Deck

Using Alluxio as a Fault-tolerant Pluggable Optimization Component of’s Computation Frameworks

Tags: , , , , ,

STRATA DATA CONFERENCE NY 2018 is China’s largest online retailer and its biggest overall retailer, as well as the country’s biggest internet company by revenue. Currently,’s BDP platform runs more than 400,000 jobs (15+ PB) daily, on a system with more than 15,000 nodes and a total capacity of 210 PB.

Alluxio, formerly Tachyon, is the world’s first system that unifies disparate storage systems at memory speed. In the big data ecosystem, Alluxio lies between computation frameworks or jobs and various kinds of storage systems. Additionally, Alluxio’s memory-centric architecture enables data access orders of magnitude faster than existing solutions.

Alluxio has run in’s production environment on 100 nodes for six months. Tao Huang, Mang Zhang, and 白冰 explain how uses Alluxio to provide support for ad hoc and real-time stream computing, using Alluxio-compatible HDFS URLs and Alluxio as a pluggable optimization component. To give just one example, one framework, JDPresto, has seen a 10x performance improvement on average. This work has also extended Alluxio and enhanced the syncing between Alluxio and HDFS for consistency.