Alluxio meetups, conferences, events and more

The latest Alluxio meetups, webinars, conferences and more

Past Events:

Build a hybrid data lake and burst processing to Google Cloud Dataproc with Alluxio

Alluxio Tech Talk * May 28, 2020

Join us for this tech talk where we will show you how Alluxio can help burst your private computing environment to Google Cloud, minimizing costs and I/O overhead. Alluxio coupled with Google’s open source data and analytics processing engine, Dataproc, enables zero-copy burst for faster query performance in the cloud so you can take advantage of resources that are not local to your data, without the need for managing the copying or syncing of that data.

Alluxio Open Office Hour

Open Online Office Hour * July 9, 2020

This is a casual online video chat where all attendees are welcome to bring your own questions. Our host Bin will have suggested topics, such as the top challenges around leveraging popular compute frameworks including Presto and Spark to access remote data, and the latest developments in Alluxio open source such as Alluxio Catalog Services.

Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio

Alluxio Global Online Meetup * April 23, 2020

Today, many people run deep learning applications with training data from separate storage such as object storage or remote data centers. This presentation will demo the Intel Analytics Zoo + Alluxio stack, an architecture that enables high performance while keeping cost and resource efficiency balanced without network being I/O bottlenecked.

Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration Between Presto & Alluxio

Alluxio Global Online Meetup * May 7, 2020

For many latency-sensitive SQL workloads, Presto is often bound by retrieving distant data. In this talk, Rohit Jain, James Sun from Facebook and Bin Fan from Alluxio will introduce their teams’ collaboration on adding a local on-SSD Alluxio cache inside Presto workers at Facebook to improve queries with unsatisfied latency.

Burst Presto & Spark workloads to AWS EMR with no data copies

Community Online Office Hour * April 28, 2020

In this talk, we will show you how to leverage any public cloud (AWS, Google Cloud Platform, or Microsoft Azure) to scale analytics workloads directly on on-prem data without copying and synchronizing the data into the cloud.

Scalable and Highly-available Distributed File System Metadata Service Using gRPC, RocksDB and RAFT

Community Online Office Hour * April 7, 2020

It is critical for Alluxio to be able to store and serve the metadata of all files and directories from all mounted external storage both at scale and at speed. This talk shares our design, implementation, and optimization of Alluxio metadata service (master node) to address the scalability challenges.

Bursting Apache Spark Workloads to the Cloud on Remote Data

Community Online Office Hour * March 10, 2020

Accessing data to run analytic workloads in Spark across data centers and/or clouds can be challenging. Additionally, network I/O can bottleneck Spark jobs that need to read a large amount of data. A common solution is to deploy an HDFS cluster closer to Spark as a caching layer and manually copy the input data to HDFS first, purging it afterward. But this ETL process can be both time-consuming and also error-prone.

Optimizing Query Performance by Decoupling Presto and Hive Data Warehouse

Community Online Office Hour * March 24, 2020

Alluxio, as a data orchestration layer provides the physical data independence, for Presto to interact with the data more efficiently. In addition to caching for IO acceleration, Alluxio also provides a catalog service to abstract the metadata in the Hive Metastore, and transformations to expose the data in compute-optimized way. In this talk, we describe some of the challenges of using Presto with Hive, and introduce Alluxio data orchestration for solving those challenges.