Two Sigma Meetup Recap – Achieving Compute and Storage Independence for Data-driven Workloads

This is a recap of the Two Sigma and Alluxio joint meetup hosted in New York. Two Sigma is a leading hedge fund that leverages cutting edge technology to train their models with petabytes of data in on-premise storage. Special thanks to Two Sigma for hosting. Here are the slides from the presentation.

In this meetup, Bin Fan from Alluxio and Wenbo Zhao from Two Sigma co-presented a reference stack (running Alluxio as a data access layer for Apache Spark) that can enable independent and separated compute and storage for big data and machine learning workloads. Two Sigma’s use case is a great example of the benefits of this reference stack for bursting machine learning computation to the public cloud while still being able to access data stored on-premise efficiently. Their data scientists want to leverage the public cloud as a scalable and elastic computation resource to speed up the end-to-end model training process. By using Alluxio as the data access layer co-located with compute in the cloud, their researchers achieved 10x faster end to end processing, which enables them to perform more iterations on their models.

We had a great time interacting with the audience on the East coast and we look forward to the next NYC event!

To stay up to date on future events, join our meetup groups: Alluxio Open Source New York MeetupAlluxio Open Source Bay Area Meetup.

If you are interested in hosting or presenting at a future event, please contact us at community@alluxio.com.