Accelerating Spark Workloads in a Mesos Environment

Tags: , , , , , , , , , , , , ,

MESOSCON EUROPE 2017

Organizations Mesos and Apache Spark together to gain insight from large amounts of data. It is common for Spark to process data stored in disparate public cloud storage, such as Amazon S3, Microsoft Azure Blob Storage, or Google Cloud Storage as well as on-premise data on HDFS, Ceph or ECS. This architecture results in sub-optimal performance as data and compute are not co-located.

Using Alluxio, a memory speed virtual distributed storage system, deployed on Mesos enables connecting any compute framework, such as Apache Spark, to storage systems via a unified namespace. Alluxio enables applications to interact with any data at memory speed. Alluxio can eliminate the pains of ETL and data duplication, and enable new workloads across all data. Gene will discuss the architecture of Mesos, Spark and Alluxio to achieve an optimal architecture for enterprises.