Accelerating Hive with Alluxio on S3

Alluxio Community Office Hour *

Hear about Bazaarvoice’s use case leveraging Apache Spark, Hive, and Alluxio on S3. And learn how to set up Hive with Alluxio so that Hive jobs can seamlessly read/write to S3.

Four Different Ways to Write to Alluxio

Alluxio is a new layer on top of under storage systems that can not only improve raw I/O performance but also enables applications flexible options to read, write and manage files. This article focuses on describing different ways to write files to Alluxio, realizing the tradeoffs in performance, consistency, and also the level of fault tolerance compared to HDFS.

Accelerating Write-intensive Data Workloads on AWS S3

Alluxio is an open-source data orchestration system widely used to speed up data-intensive workloads in the cloud. Alluxio v2.0 introduced Replicated Async Write to allow users to complete writes to Alluxio file system and return quickly with high application performance, while still providing users with peace of mind that data will be persisted to the chosen under storage like S3 in the background.

Am I bottlenecking on S3, S3 throttling my requests?

When working with big data analytics and AI, you are likely reading and writing terabytes from S3, in some cases even at very high data transfer rates. There are several reasons why you may find S3 is slowing down your data-intensive job.