Bursting Spark or Presto Jobs to AWS using Alluxio

Community Online Office Hour *

In this office hour, we demonstrate how a “zero-copy burst” solution helps to speed up Spark and Presto queries in the public cloud while eliminating the process of manually copying and synchronizing data from the on-premise data lake to cloud storage. This approach allows compute frameworks to decouple from on-premise data sources and scale efficiently by leveraging Alluxio and public cloud resources such as AWS.

“Zero-Copy” Hybrid Cloud for Data Analytics – Strategy, Architecture and Benchmark Report

This whitepaper details how to leverage a public cloud, such as Amazon AWS, Google GCP, or Microsoft Azure to scale analytic workloads directly on data on-premises without copying and synchronizing the data into the cloud. We will show an example of what it might look like to run on-demand Presto and Hive with Alluxio in the public cloud using on-prem HDFS. We will also show how to set up and execute performance benchmarks in two geographically dispersed Amazon EMR clusters along with a summary of our findings.

Tags: , , , , , , , , , ,

Burst Presto & Spark workloads to AWS EMR with no data copies

Community Online Office Hour *

In this talk, we will show you how to leverage any public cloud (AWS, Google Cloud Platform, or Microsoft Azure) to scale analytics workloads directly on on-prem data without copying and synchronizing the data into the cloud.

Testing Distributed System at Scale for the Cost of a Large Pizza on AWS

Community Online Office Hour *

Building distributed systems is no small feat. Software testing is just one of many critical practices that engineers who build these systems need to utilize to ensure the quality and usability of their software. For distributed systems, scaling out testing frameworks to ensure that enterprises who run our in highly distributed environments is a complicated (and expensive task!)

Speeding Up I/O for Machine Learning

Alluxio Global Online Meetup *

This talk will guide the audience on how Alluxio can greatly simplify the data preparation phase in with remote and possibly multiple data sources. We will share the lessons and benchmark from Bill Zhao an engineer led in Apple when building a Machine Learning platform using Tensorflow, NFS, DC/OS and Alluxio.

Tech Talk: Integrating Google Cloud Dataproc with Alluxio for faster performance in the cloud

Google Cloud Dataproc is a widely used fully managed Spark and Hadoop service to run big data analytics and compute workloads in the cloud. Services like Dataproc reduce hardware spend, eliminate the need to overbuy capacity, and provide business agility. Yet users still face challenges for performance sensitive workloads or workloads running on remote data. 

Alluxio is an open source cloud data orchestration platform that increases performance of analytic workloads running on Dataproc by intelligently caching data and bringing back lost data locality. Alluxio also enables users to run compute workloads against on-prem storage like Hadoop HDFS without any app changes. 

Chris Crosbie and Roderick Yao from the Google Dataproc team and Dipti Borkar of Alluxio demo how to set up Google Cloud Dataproc with Alluxio so jobs can seamlessly read from and write to Cloud Storage. They also show how to run Dataproc Spark against a remote HDFS cluster. 

Tags: , , ,

Tech Talk: The Path to Migrating off MapR

If you’re a MapR user, you might have concerns with your existing data stack. Whether it’s the complexity of Hadoop, financial instability and no future MapR product roadmap, or no flexibility when it comes to co-locating storage and compute, MapR may no longer be working for you. 

Alluxio can help you migrate to a modern, disaggregated data stack using any object store with the similar performance of Hadoop plus significant cost savings.

Join us for this tech talk where we’ll discuss how to separate your compute and storage on-prem and architect a new data stack that makes your object store the core. We’ll show you how to offload your MapR/HDFS compute to any object store and how to run all of your existing jobs as-is on Alluxio + object store.

Tags: , , ,