Accelerate Spark and Hive Jobs on AWS S3 by 10x with Alluxio Tiered Storage

In this article, Thai Bui from Bazaarvoice describes how Bazaarvoice leverages Alluxio to build a tiered storage architecture with AWS S3 to maximize performance and minimize operating costs on running Big Data analytics on AWS EC2.
Takeaways: Common challenges in performance and cost to build an efficient big data analytics platform on AWS; How to setup Hive metastore to leverage Alluxio as the storage tier for “hot tables” backed by all tables on AWS S3 as the source of truth; How to setup tiered storage within Alluxio based on ZFS and NVMe on EC2 instances to maximize the read performance; Benchmark results of micro and real-world workloads.