Alluxio meetups, conferences, events and more

The latest Alluxio meetups, webinars, conferences and more

Events

Bursting Apache Spark Workloads to the Cloud on Remote Data

Community Online Office Hour *

Accessing data to run analytic workloads in Spark across data centers and/or clouds can be challenging. Additionally, network I/O can bottleneck Spark jobs that need to read a large amount of data. A common solution is to deploy an HDFS cluster closer to Spark as a caching layer and manually copy the input data to HDFS first, purging it afterward. But this ETL process can be both time-consuming and also error-prone.

Testing Distributed System at Scale for the Cost of a Large Pizza on AWS

Community Online Office Hour *

Building distributed systems is no small feat. Software testing is just one of many critical practices that engineers who build these systems need to utilize to ensure the quality and usability of their software. For distributed systems, scaling out testing frameworks to ensure that enterprises who run our in highly distributed environments is a complicated (and expensive task!)

Running Presto with Alluxio on Amazon EMR

Community Online Office Hour *

Many organizations are leveraging EMR to run big data analytics on public cloud. However, reading and writing data to S3 directly can result in slow and inconsistent performance. Alluxio is a data orchestration layer for the cloud, and in this use case it caches data for S3, ensuring high and predictable performance as well as reduced network traffic.

How to Develop and Operate Cloud-Native Data Platforms and Applications

Alluxio Global Online Meetup *

This talk will overview two projects at Electronic Arts (EA) that address the mismatch by data orchestration: One project automatically generates configurations for all components in a large monitoring system, which reduces the daily average number of alerts from ~1000 to ~20. The other project introduces Alluxio for caching and unifying address space across ETL and analytics workloads, which substantially simplifies architecture, improves performance, and reduces ops overheads.

What’s new in Alluxio 2: from seamless operations to structured data management

Community Online Office Hour *

Alluxio 2.0 expands the system in three major directions including improving the operability of the system, having more advanced data management, as well as re-architecting the system to be able to scale to 1 billion + file. The system is now cloud native on AWS, Google Cloud, and allow users to enable native deployment with K8s. The new advanced data management enables data migration and replication from diff storage systems.

CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes Using Alluxio

CNCF Webinar *

In the on-prem days, one key performance optimization for Apache Hadoop or Apache Spark workloads is to run tasks on nodes with local HDFS data. However, while adoption of the Cloud & Kubernetes makes scaling compute workloads exceptionally easy, HDFS is often not an option. Effectively accessing data from cloud-native storage services like AWS S3 or even on-premises HDFS becomes harder as data locality is lost.

Speeding Up I/O for Machine Learning

Alluxio Global Online Meetup *

This talk will guide the audience on how Alluxio can greatly simplify the data preparation phase in with remote and possibly multiple data sources. We will share the lessons and benchmark from Bill Zhao an engineer led in Apple when building a Machine Learning platform using Tensorflow, NFS, DC/OS and Alluxio.