Effective Spark with Alluxio

Alluxio, formerly Tachyon, is a memory speed virtual distributed storage system and leverages memory for storing data and accelerating access to data in different storage systems.. Alluxio has a quickly growing open source community of developers and users and is deployed at such organizations as Alibaba, Baidu, Barclays, Intel, Huawei, and Qunar. Many of these deployments use Alluxio with Spark, and some of them scale out to over PB’s of data. While Spark is already gaining great adoption, Alluxio can enable Spark to be even more effective. Alluxio bridges Spark applications with various storage systems and further accelerates data intensive applications. In this talk, we briefly introduce Alluxio, present several ways how Alluxio can help Spark be more effective, show benchmark results with Spark RDDs and DataFrames, and describe production deployments both Alluxio and Spark working together. In the meantime, we will provide live demos for some of the use cases.