Alluxio on EMR: Fast Storage Access and Sharing for Spark Jobs

This is a guest blog by Chengzhi Zhao with an original blog source. Traditionally, if you want to run a single Spark job on EMR, you might follow the steps: launching a cluster, running the job which reads data from storage layer like S3, performing transformations within RDD/Dataframe/Dataset, finally, sending the result back to S3. … Continued