ALLUXIO COMMUNITY NEWSLETTER

APRIL 2020


What’s to come in May


Accelerating Queries on Cloud Data Lakes
May 5 | Dataversity Webinar

From limited Hadoop compute capacity to increased data scientist efficiency. With Alluxio’s “zero-copy” burst solution, companies can bridge remote data centers and data lakes with computing frameworks in other locations, enabling them to offload, compute, and leverage the flexibility, scalability, and power of the cloud for their remote data.

Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration Between Presto & Alluxio
May 7 | Global Online Meetup

For many latency-sensitive SQL workloads, Presto is often bound by retrieving distant data. In this talk, Rohit Jain from Facebook and Bin Fan from Alluxio will introduce their teams’ collaboration on adding a local on-SSD Alluxio cache inside Presto workers at Facebook to improve queries with unsatisfied latency.

Join the Alluxio community and ecosystem experts online for open discussions
May 12 | Open Online Office Hour

This is a casual online video chat where all attendees are welcome to bring your own questions. Our host Bin will have suggested topics, such as the top challenges around leveraging popular compute frameworks including Presto and Spark to access remote data, and the latest developments in Alluxio open source.

Recap of April


On-demand | Burst Presto & Spark workloads to AWS EMR with no data copies

This talk will dive into how to leverage any public cloud (AWS, Google Cloud Platform, or Microsoft Azure) to scale analytics workloads directly on on-prem data without copying and synchronizing the data into the cloud.

On-demand | Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio

Today, many people run deep learning applications with training data from separate storage such as object storage or remote data centers. This presentation demos the Intel Analytics Zoo + Alluxio stack, an architecture that enables high performance while keeping cost and resource efficiency balanced without network being I/O bottlenecked.

On-demand | Scalable and Highly-available Distributed File System Metadata Service Using gRPC, RocksDB and RAFT

Alluxio (alluxio.io) is an open-source data orchestration system that provides a single namespace federating multiple external distributed storage systems. It is critical for Alluxio to be able to store and serve the metadata of all files and directories from all mounted external storage both at scale and at speed. This talk shares our design, implementation, and optimization of Alluxio metadata service (master node) to address the scalability challenges.

New Partnership


Image

Image

We are excited to announce our partnership with Intel with the launch of an enhanced Hybrid Cloud Solution based on Intel Optane Persistent Memory! Learn more about the joint solution in our solution brief Get Insights Faster with Alluxo and Intel.

Good Reads


Whitepaper | “Zero-Copy” Hybrid Cloud for Data Analytics – Strategy, Architecture and Benchmark Report

Blog | Serving Structured Data in Alluxio: Example

Blog | Serving Structured Data in Alluxio, Concept

Blog | Everything you want to know about how to decouple SQL engines from Hive Data Warehouse


Image

Join our Slack channel! 

Get your questions answered by the experts in our Slack community channel