Accelerating Cloud Pipelines with Alluxio and Fast Durable Writes

Using Alluxio, data can be shared between pipeline stages at memory speed. By reading and writing data in Alluxio, the data can stay in memory for the next stage of the pipeline, and this can greatly increase the performance. Alluxio Enterprise Edition (AEE) introduces Fast Durable Writes, a feature which enables low latency and fault-tolerant writes. In this article, we describe the Fast Durable Writes feature, and explore how Alluxio can be deployed and used with a data pipeline.

Apache Spark DataFrame caching with Alluxio

Many organizations deploy Alluxio together with Spark for performance gains and data manageability benefits. Qunar recently deployed Alluxio in production, and their Spark streaming jobs sped up by 15x on average and up to 300x during peak times. They noticed that some Spark jobs would slow down or would not finish, but with Alluxio, those jobs could finish quickly. In this blog post, we investigate how Alluxio helps Spark be more effective. Alluxio increases performance of Spark jobs, helps Spark jobs perform more predictably, and enables multiple Spark jobs to share the same data from memory.

Tags: , , , ,