Alluxio on EMR: Fast Storage Access and Sharing for Spark Jobs

This is a guest blog by Chengzhi Zhao with an original blog source. Traditionally, if you want to run a single Spark job on EMR, you might follow the steps: launching a cluster, running the job which reads data from storage layer like S3, performing transformations within RDD/Dataframe/Dataset, finally, sending the result back to S3. … Continued

Building a cloud-native analytics MPP database with Alluxio

This article walks through the journey of a startup HashData in Beijing to build a cloud-native high-performance MPP shared-everything architecture leveraging object storage as the data persistence layer and Alluxio as a data orchestration layer in the cloud. HashData was founded in 2016 by a group of open source data veterans from Pivotal, Teradata, IBM, … Continued