Alluxio Community Newsletter

ALLUXIO COMMUNITY NEWSLETTER

JUNE 2020

Alluxio 2.3 is here! Alluxio 2.3.0 focuses on streamlining the user experience in hybrid cloud deployments where Alluxio is deployed with compute in the cloud to access data on-prem. Features such as the mount wizard and concurrent metadata synchronization greatly improve Alluxio’s functionality. Integrations with AWS EMR, Google Dataproc, K8s, and AWS Glue make Alluxio easy to use in a variety of cloud environments.

Release notes | Download

Upcoming Events

Building Under File System in Alluxio with Tencent
Jun 30 | Alluxio Global Online Meetup

Baolong Mao from Tencent shares his experience in developing Apache Ozone Under File System, showing how to create a new Alluxio Under File System in a few steps with minimal lines of code. The UFS connects to any file systems or object stores, so users can mount different storages like AWS S3 or HDFS into Alluxio namespace.

Join the Alluxio community and ecosystem experts online for open discussions
Jul 9 | Open Online Office Hour

This is a casual online video chat where all attendees are welcome to bring your own questions. Join Bin Fan and our Open Source core maintainers in this bi-weekly Q&A session. We will have suggested topics, such as the top challenges around leveraging popular compute frameworks including Presto and Spark to access remote data, and the latest developments in Alluxio open source.

What’s New in Alluxio 2.3
Jul 14 | Alluxio Community Office Hour

Alluxio 2.3 is just released at the end of June 2020. Calvin and Bin will go over the new features and integrations available and share learnings from the community. Any questions about the release and on-going community feature development are welcome.

RECAP OF JUNE

On Demand | Bursting Spark or Presto Jobs to AWS using Alluxio
In this office hour, we demonstrate how a “zero-copy burst” solution helps to speed up Spark and Presto queries in the public cloud while eliminating the process of manually copying and synchronizing data from the on-premise data lake to cloud storage. This approach allows compute frameworks to decouple from on-premise data sources and scale efficiently by leveraging Alluxio and public cloud resources such as AWS.

On Demand | Build a Hybrid Data Lake and Burst Processing to Google Cloud Dataproc with Alluxio
A joint tech talk with Google Cloud on how Alluxio can help burst your private computing environment to Google Cloud, minimizing costs and I/O overhead. A demo of running Alluxio and Dataproc is included!

On Demand | Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration Between Presto & Alluxio
For many latency-sensitive SQL workloads, Presto is often bound by retrieving distant data. In this talk, Rohit Jain from Facebook and Bin Fan from Alluxio will introduce their teams’ collaboration on adding a local on-SSD Alluxio cache inside Presto workers at Facebook to improve queries with unsatisfied latency.

GOOD READS

Blog | Accelerating Analytics by 200% with Impala, Alluxio, and HDFS at Tencent

Blog | Improving Presto Latencies with Alluxio Data Caching at Facebook

Tutorial | Deep Learning at Alibaba Cloud with Alluxio – Running PyTorch on HDFS

Blog | Burst data lake processing to Dataproc using on-prem Hadoop data

Blog | Efficient Model Training in the Cloud with Kubernetes, TensorFlow, and Alluxio

Join our Slack channel!

Get your questions answered by the experts in our Slack community channel