ml Archives | Alluxio

AI/ML Infra Meetup – Highlights & Key Takeaways

July 10, 2024 By Chanchan Mao

Co-hosted by Alluxio and Uber on May 23, 2024, AI/ML Infra Meetup was the community event for developers focused on building AI, ML and data infrastructure at scale. We were thrilled by the overwhelming interest and enthusiasm in our meetup! This event brought together over 100 AI/ML infrastructure engineers and enthusiasts to discuss the latest … Continued

Deconstructing a Machine Learning Pipeline with Virtual Data Lake

August 25, 2022

As more and more companies turn to AI / ML / DL to unlock insight, AI has become this mythical word that adds unnecessary barriers to new adaptors. Oftentimes it was regarded as luxury for those big tech companies only – this should not be the case.

Tags: data lake, ml, product school

Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds

July 1, 2022

Alluxio foresaw the need for agility when accessing data across silos separated from compute engines like Spark, Presto, Tensorflow and PyTorch. Embracing the separation of storage from compute, the Alluxio data orchestration platform simplifies adoption of the data lake and data mesh paradigm for analytics and AI/ML. In this talk, Bin Fan will share observations to help identify ways to use the platform to meet the needs of your data environment and workloads.

Tags: ai, data lake, data mesh, data platform, ml

Alluxio on Kubernetes – Powering training through Container Storage Interface plugin

April 28, 2022

Shawn Sun from Alluxio will present the journey of using Alluxio as the storage system for Kubernetes through Container Storage Interface (CSI) plugin and Alluxio CSI driver. This talk will cover the challenges we are facing with traditional setup in the AI/ML training jobs, and how Alluxio CSI driver manages to address them. It will also talk about a recent change to the driver that made it more sturdy and robust.

Tags: ai, alluxio day, CSI driver, kubernetes, ml, storage

Recommendations to Level Up Your Machine Learning Platform

April 12, 2022 By Bin Fan

With machine learning (ML) and artificial intelligence (AI) applications becoming more business-critical, organizations are in the race to advance their AI/ML capabilities. To realize the full potential of AI/ML, having the right underlying machine learning platform is a prerequisite.

Orchestrating Data for Machine Learning Pipelines

April 8, 2022 By Bin Fan

This article will discuss a new solution to orchestrating data for end-to-end machine learning pipelines that addresses the above questions. I will outline common challenges and pitfalls, followed by proposing a new technique, data orchestration, to optimize the data pipeline for machine learning.

Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds

January 27, 2022

Data platform teams are increasingly challenged with accessing multiple data stores that are separated from compute engines, such as Spark, Presto, TensorFlow or PyTorch. Whether your data is distributed across multiple datacenters and/or clouds, a successful heterogeneous data platform requires efficient data access. Alluxio enables you to embrace the separation of storage from compute and use Alluxio data orchestration to simplify adoption of the data lake and data mesh paradigms for analytics and AI/ML workloads.

Tags: ai, analytics, cloud, compute, data orchestration, data platform, data stores, ml, storage

Thousand-Node Alluxio Cluster Powers Game AI Platform – A Production Case Study from Tencent

January 26, 2022 By Bing Zheng, Baolong Mao and Zhizheng Pan

To provide model training with the best experience, Tencent has implemented a 1000-node Alluxio cluster and designed a scalable, robust, and performant architecture to speed up Ceph storage for game AI training. This blog will give you insight into how Alluxio has been implemented and optimized at Tencent.

Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds

Alluxio Product School * January 27, 2022

Whether your data is distributed across multiple datacenters and/or clouds, a successful heterogeneous data platform requires efficient data access. Alluxio enables you to embrace the separation of storage from compute and use Alluxio data orchestration to simplify adoption of the data lake and data mesh paradigms for analytics and AI/ML workloads.

Tag: ml