Achieving 10x acceleration of Spark and Hive Jobs on AWS S3 with Alluxio Tiered Storage

The data engineering team at Bazaarvoice, a software-as-a-service digital marketing company based in Austin, Texas, must handle data at massive Internet-scale to serve its customers. Facing challenges with scaling their storage capacity up and provisioning hardware, they turned to Alluxio’s tiered storage system and saw 10x acceleration of their Spark and Hive jobs running on AWS S3.

In this whitepaper you’ll learn:

  • How to build a big data analytics platform on AWS that includes technologies like¬†Hive, Spark, Kafka, Storm, Cassandra, and more
  • How to setup a Hive metastore using a storage tier for hot tables
  • How to leverage tiered storage for maximized read performance

Tags: , , , , , ,

Accelerate Spark and Hive Jobs on AWS S3 by 10x with Alluxio Tiered Storage

In this article, Thai Bui from Bazaarvoice describes how Bazaarvoice leverages Alluxio to build a tiered storage architecture with AWS S3 to maximize performance and minimize operating costs on running Big Data analytics on AWS EC2. This blog is an abbreviated version of the full-length technical whitepaper (coming soon) which aims to provide the following takeaways: Common … Continued