Whitepaper: Using Alluxio to Improve the Performance and Consistency of HDFS Clusters

Alluxio is the world’s first memory-speed virtual distributed storage system that bridges applications and underlying storage systems, providing unified data access orders of magnitudes faster than existing solutions. The Hadoop Distributed File System (HDFS) is a distributed file system for storing large volumes of data. HDFS popularized the paradigm of bringing computation to data and the co-located compute and storage architecture. We used Spark 2.0 for computation and compared the performance of 2 stacks, one where Spark jobs was run directly on data in HDFS and another where Spark jobs were run on data in an Alluxio file system backed by HDFS.

Learn how Alluxio is used in clusters with co-located compute and storage to improve two key metrics of Data Analytics Clusters:

  • Performance predictability allowing SLAs to be met more easily.
  • Up to 10x improved performance.