What’s New in Alluxio 2.3

Community Online Office Hour *

Alluxio 2.3 was just released at the end of June 2020. Calvin and Bin will go over the new features and integrations available and share learnings from the community. Any questions about the release and on-going community feature development are welcome.

Bursting Spark or Presto Jobs to AWS using Alluxio

Community Online Office Hour *

In this office hour, we demonstrate how a “zero-copy burst” solution helps to speed up Spark and Presto queries in the public cloud while eliminating the process of manually copying and synchronizing data from the on-premise data lake to cloud storage. This approach allows compute frameworks to decouple from on-premise data sources and scale efficiently by leveraging Alluxio and public cloud resources such as AWS.

Alluxio Open Office Hour

Open Online Office Hour *

This is a casual online video chat where all attendees are welcome to bring your own questions. Our host Bin will have suggested topics, such as the top challenges around leveraging popular compute frameworks including Presto and Spark to access remote data, and the latest developments in Alluxio open source such as Alluxio Catalog Services.

Burst Presto & Spark workloads to AWS EMR with no data copies

Community Online Office Hour *

In this talk, we will show you how to leverage any public cloud (AWS, Google Cloud Platform, or Microsoft Azure) to scale analytics workloads directly on on-prem data without copying and synchronizing the data into the cloud.

Bursting Apache Spark Workloads to the Cloud on Remote Data

Community Online Office Hour *

Accessing data to run analytic workloads in Spark across data centers and/or clouds can be challenging. Additionally, network I/O can bottleneck Spark jobs that need to read a large amount of data. A common solution is to deploy an HDFS cluster closer to Spark as a caching layer and manually copy the input data to HDFS first, purging it afterward. But this ETL process can be both time-consuming and also error-prone.

Optimizing Query Performance by Decoupling Presto and Hive Data Warehouse

Community Online Office Hour *

Alluxio, as a data orchestration layer provides the physical data independence, for Presto to interact with the data more efficiently. In addition to caching for IO acceleration, Alluxio also provides a catalog service to abstract the metadata in the Hive Metastore, and transformations to expose the data in compute-optimized way. In this talk, we describe some of the challenges of using Presto with Hive, and introduce Alluxio data orchestration for solving those challenges.

What’s new in Alluxio 2: from seamless operations to structured data management

Community Online Office Hour *

Alluxio 2.0 expands the system in three major directions including improving the operability of the system, having more advanced data management, as well as re-architecting the system to be able to scale to 1 billion + file. The system is now cloud native on AWS, Google Cloud, and allow users to enable native deployment with K8s. The new advanced data management enables data migration and replication from diff storage systems.

Testing Distributed System at Scale for the Cost of a Large Pizza on AWS

Community Online Office Hour *

Building distributed systems is no small feat. Software testing is just one of many critical practices that engineers who build these systems need to utilize to ensure the quality and usability of their software. For distributed systems, scaling out testing frameworks to ensure that enterprises who run our in highly distributed environments is a complicated (and expensive task!)