Fast Big Data Analytics with Spark on Tachyon in Baidu

Tags: , , , , , ,



In this talk we will focus on how Tachyon can help improve big data analytics (ad-hoc query) efficiency within Baidu. In detail, we will explain:

Currently within Baidu, we have a production Tachyon cluster with 100 nodes and over 2 PB of storage space, this cluster mainly serves as the cache layer for our Big Data Analytics engine. In this talk, first we introduce the Big Data Analytic infrastructure within Baidu. Then, we explain why we started using Tachyon a few months ago, as well as the problems encountered when we started using Tachyon. Next, we delve into the details of how Tachyon help accelerate our Big Data Analytics pipeline at its current state. At the end, we discuss what new features we want to see and the plan to scale further.


Shaoshan Liu is currently a Senior Architect at Baidu U.S.A. working on Big Data Infrastructure. Before Baidu, he worked at Linkedin and Microsoft. Shaoshan has a Ph.D. from UC Irvine.