spark Archives | Page 6 of 12

Tech Talk: Accelerating Analytics with EMR on your S3 Data Lake

September 12, 2019

This tech talk gives shows how to set up EMR Spark and Hive with Alluxio to seamlessly read/write to your S3 data lake, along with performance benefits.

Tags: aws s3, emr, spark, tech talk

Building a Cloud Native Stack with EMR Spark, Alluxio, and S3

Alluxio Community Office Hour * August 27, 2019

Learn how to set up EMR Spark with Alluxio so Spark jobs can seamlessly read from and write to S3. See the performance comparison between Spark on S3 with Spark, and Alluxio on S3.

Accelerating Spark with Kubernetes

Alluxio Tech Talk * August 7, 2019

This tech talk gives a quick overview of Alluxio and the use cases it powers for Spark/Presto in Kubernetes. We also show you how to set up Alluxio and Spark/Presto to run in Kubernetes.

Accelerating Write-intensive Data Workloads on AWS S3

August 7, 2019 By Zac Blanco and Bin Fan

Alluxio is an open-source data orchestration system widely used to speed up data-intensive workloads in the cloud. Alluxio v2.0 introduced Replicated Async Write to allow users to complete writes to Alluxio file system and return quickly with high application performance, while still providing users with peace of mind that data will be persisted to the chosen under storage like S3 in the background.

Bay Area Meetup: Alluxio 2.0 Deep Dive and Near Real-time Analytics with Spark

July 23, 2019

This meetup presents an overview of the motivations and design decisions behind the major changes in the Alluxio 2.0 release, and Real-time Data Processing for Sales Attribution Analysis with Alluxio, Spark and Hive at VIPShop.

Tags: alluxio engineering, apache hadoop, apache spark, compute, compute storage separation, data, data orchestration, hadoop, hdfs, meetup, scale, spark, storage

Tag: spark