On-demand compute clusters are often used to save the cost of running and maintaining a continuous cluster for the sake of ad-hoc analysis. Such clusters also provide significant cost savings in storage, since data can be stored in a much cheaper medium, such as object storage. However, one critical downside which prevents on-demand compute clusters from becoming the norm for sporadic data analytics is the lack of high performance. Without co-locating compute and storage, queries and analysis may take unacceptably long periods of time, greatly reducing the value of gathering such insights.
To address this limitation, Alluxio is used as a lightweight data access layer on the compute nodes to bring performance up to memory speeds without requiring a long running cluster. This talk will summarize why Alluxio’s architecture makes it a perfect fit for completing the on-demand cluster puzzle.