What do I do if Hadoop is slow?

So you have a Hadoop cluster that’s running fine and then you start to hear people saying that their jobs are running slow. This answer is meant to cover common reasons for slowness and look at some solutions to this problem.

From limited Hadoop compute capacity to increased data scientist efficiency

Alluxio Tech Talk *

This tech talk will share approaches to burst data to the cloud along with
how Alluxio can enable “zero-copy” bursting of Spark workloads to cloud data services like EMR and Dataproc. Learn how DBS bank uses Alluxio to solve for limited on-prem compute capacity.

Powering Data Science and AI with Apache Spark, Alluxio, and IBM

Alluxio Global Online Meetup *

In this online meetup, we will present the benefits of the fast analytics stack of Spark on Alluxio, and dive into China Unicom’s use case of leveraging Spark and Alluxio to process massive amounts of mobile data.

Effective Analytical Pipelines on AWS Using EMR, Alluxio, and S3

This article describes my lessons from a previous project which moved a data pipeline originally running on a Hadoop cluster managed by my team, to AWS using EMR and S3. The goal was to leverage the elasticity of EMR to offload the operational work, as well as make S3 a data lake where different teams can easily share data across projects.