ALLUXIO COMMUNITY NEWSLETTER

MAY 2020

What’s to come in JUNE

Join the Alluxio community and ecosystem experts online for open discussions
Jun 9 | Open Online Office Hour

This is a casual online video chat where all attendees are welcome to bring your own questions. Join Bin Fan and our Open Source core maintainers in this bi-weekly Q&A session. We will have suggested topics, such as the top challenges around leveraging popular compute frameworks including Presto and Spark to access remote data, and the latest developments in Alluxio open source.

Bursting Spark or Presto Jobs to AWS using Alluxio
Jun 23 | Alluxio Office Hour

In this office hour, we demonstrate how a “zero-copy burst” solution helps to speed up Spark and Presto queries in the public cloud while eliminating the process of manually copying and synchronizing data from the on-premise data lake to cloud storage. This approach allows compute frameworks to decouple from on-premise data sources and scale efficiently by leveraging Alluxio and public cloud resources such as AWS.

Recap of MAY

On Demand | Build a Hybrid Data Lake and Burst Processing to Google Cloud Dataproc with Alluxio

A joint tech talk with Google Cloud on how Alluxio can help burst your private computing environment to Google Cloud, minimizing costs and I/O overhead. A demo of running Alluxio and Dataproc is included!

On Demand | Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration Between Presto & Alluxio

For many latency-sensitive SQL workloads, Presto is often bound by retrieving distant data. In this talk, Rohit Jain from Facebook and Bin Fan from Alluxio will introduce their teams’ collaboration on adding a local on-SSD Alluxio cache inside Presto workers at Facebook to improve queries with unsatisfied latency.

On Demand | Spark + Alluxio: Case Analysis of Data Localization Optimization for K8s and China Unicom Big Data System (China Online Meetup)

This event is in Chinese, featuring Zhang Ce from China Unicom and Jiacheng Liu from Alluxio. Learn more about data locality of Spark + Alluxio on Kubernetes.

On Demand | Accelerating Queries on Cloud Data Lakes

From limited Hadoop compute capacity to increased data scientist efficiency. With Alluxio’s “zero-copy” burst solution, companies can bridge remote data centers and data lakes with computing frameworks in other locations, enabling them to offload, compute, and leverage the flexibility, scalability, and power of the cloud for their remote data.

Good Reads

Blog | Burst data lake processing to Dataproc using on-prem Hadoop data

Blog | Efficient Model Training in the Cloud with Kubernetes, TensorFlow, and Alluxio

Blog | Intel Analytics Zoo + Alluxio to Accelerate Deep Learning in Hybrid Cloud

Whitepaper | Using Alluxio to Optimize and Improve Performance of Kubernetes-Based Deep Learning in the Cloud

Whitepaper | Accelerating deep learning on Apache Spark in a cloud environment with the Intel Analytics Zoo + Alluxio stack

Join our Slack channel!

Get your questions answered by the experts in our Slack community channel