Building a Distributed File System For The Cloud-Native Era

Big Data Bellevue Meetup May 19, 2022 Today, data engineering in modern enterprises has become increasingly more complex and resource-consuming, particularly because (1) the rich amount of organizational data is often distributed across data centers, cloud regions, or even cloud providers, and (2) the complexity of the big data stack has been quickly increasing over … Continued

Tags: , , ,

Alluxio on Kubernetes – Powering training through Container Storage Interface plugin

Shawn Sun from Alluxio will present the journey of using Alluxio as the storage system for Kubernetes through Container Storage Interface (CSI) plugin and Alluxio CSI driver. This talk will cover the challenges we are facing with traditional setup in the AI/ML training jobs, and how Alluxio CSI driver manages to address them. It will also talk about a recent change to the driver that made it more sturdy and robust.

Tags: , , , , ,

Spark + Alluxio Overview | Pair Spark with Alluxio to Modernize Your Data Platform

By bringing Alluxio together with Spark, you can modernize your data platform in a scalable, agile, and cost-effective way.  In this post, we provide an overview of the Spark + Alluxio stack. We explain the architecture, discuss real-world examples, describe deployment models, and showcase performance and cost benchmarking.

Tags: , , , , , ,

Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds

Data platform teams are increasingly challenged with accessing multiple data stores that are separated from compute engines, such as Spark, Presto, TensorFlow or PyTorch. Whether your data is distributed across multiple datacenters and/or clouds, a successful heterogeneous data platform requires efficient data access. Alluxio enables you to embrace the separation of storage from compute and use Alluxio data orchestration to simplify adoption of the data lake and data mesh paradigms for analytics and AI/ML workloads.

Tags: , , , , , , , ,

Thousand-Node Alluxio Cluster Powers Game AI Platform – A Production Case Study from Tencent

Tencent is one of the largest technology companies in the world and a leader in the gaming sector. The game AI platform supports AI research and development at Tencent. To provide model training with the best experience, Tencent has implemented a 1000-node Alluxio cluster and designed a scalable, robust, and performant architecture to accelerate the game AI training.

Tags: , , , , , , ,

Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds

Alluxio Product School *

Whether your data is distributed across multiple datacenters and/or clouds, a successful heterogeneous data platform requires efficient data access. Alluxio enables you to embrace the separation of storage from compute and use Alluxio data orchestration to simplify adoption of the data lake and data mesh paradigms for analytics and AI/ML workloads.