On-Demand Videos

video

AI/ML Infra Meetup | Open Source Michelangelo: Uber's Predictive to Generative end to end ML Lifecycle management platform

In this talk, Eric Wang, Senior Staff Software Engineer introduces Uber’s open-source generative end-to-end ML lifecycle management platform: Michelangelo.

Watch now

video

AI/ML Infra Meetup | Unlock the Future of Generative AI: TorchTitan's Latest Breakthroughs

In this talk, Jiani Wang, Software Engineer Meta's Pytorch Team, dives into the overview and the latest advancements in TorchTitan.

Watch now

video

AI/ML Infra Meetup | Bringing Data to GPUs Anywhere + Get Low-Latency on Object Store with Alluxio

In this talk, Bin Fan, VP of Technology at Alluxio, explores how to enable efficient data access across distributed GPU infrastructure, achieving low-latency performance for feature stores and RAG workloads.

Watch now

video

Optimize, Don’t Overspend: Data Caching Strategy for AI Workloads

As machine learning and deep learning models grow in complexity, AI platform engineers and ML engineers face significant challenges with slow data loading and GPU utilization, often leading to costly investments in high-performance computing (HPC) storage. However, this approach can result in overspending without addressing the core issues of data bottlenecks and infrastructure complexity.

A better approach is adding a data caching layer between compute and storage, like Alluxio, which offers a cost-effective alternative through its innovative data caching strategy. In this webinar, Jingwen will explore how Alluxio's caching solutions optimize AI workloads for performance, user experience and cost-effectiveness.

What you will learn:

The I/O bottlenecks that slow down data loading in model training
How Alluxio's data caching strategy optimizes I/O performance for training and GPU utilization, and significantly reduces cloud API costs
The architecture and key capabilities of Alluxio
Using Rapid Alluxio Deployer to install Alluxio and run benchmarks in AWS in just 30 minutes

Watch now

video

AI/ML Infra Meetup | OpenAI: Preference Tuning and Fine Tuning LLMs

OpenAI’s developer Developer Experience Engineer, Ankit Khare, provides practical insights for AI enthusiasts on effectively customizing and leveraging LLMs in various applications through preference tuning and fine-tuning.

Watch now

video

What’s new in Alluxio Enterprise AI 3.2: Leverage GPU Anywhere, Pythonic Filesystem API, Write Checkpointing and more

In today’s AI-driven world, organizations face unprecedented demands for powerful AI infrastructure to fuel their model training and serving workloads. Performance bottlenecks, cost inefficiencies, and management complexities pose significant challenges for AI platform teams supporting large-scale model training and serving. On July 9, 2024, we introduced Alluxio Enterprise AI 3.2, a groundbreaking solution designed to address these critical issues in the ever-evolving AI landscape.

In this webinar, Shouwei Chen introduced exciting new features of Alluxio Enterprise AI 3.2:

Leveraging GPU resources anywhere accessing remote data with the same local performance
Enhanced I/O performance with 97%+ GPU utilization for popular language model training benchmarks
Achieving the same performance as HPC storage on existing data lake without additional HPC storage infrastructure
New Python FileSystem API to seamlessly integrate with Python applications like Ray
Other new features, include advanced cache management, rolling upgrades, and CSI failover

Watch now

video

10x Faster Trino Queries on Your Data Platform

As Trino users increasingly rely on cloud object storage for retrieving data, speed and cloud cost have become major challenges. The separation of compute and storage creates latency challenges when querying datasets; scanning data between storage and compute tiers becomes I/O bound. On the other hand, cloud API costs related to GET/LIST operations and cross-region data transfer add up quickly.

The newly introduced Trino file system cache by Alluxio aims to overcome the above challenges. In this session, Jianjian will dive into Trino data caching strategies, the latest test results, and discuss the multi-level caching architecture. This architecture makes Trino 10x faster for data lakes of any scale, from GB to EB.

What you will learn:

Challenges relating to the speed and costs of running Trino in the cloud
The new Trino file system cache feature overview, including the latest development status and test results
A multi-level cache framework for maximized speed, including Trino file system cache and Alluxio distributed cache
Real-world cases, including a large online payment firm and a top ridesharing company
The future roadmap of Trino file system cache and Trino-Alluxio integration

Watch now

video

AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & Serving

Speed and efficiency are two requirements for the underlying infrastructure for machine learning model development. Data access can bottleneck end-to-end machine learning pipelines as training data volume grows and when large model files are more commonly used for serving. For instance, data loading can constitute nearly 80% of the total model training time, resulting in less than 30% GPU utilization. Also, loading large model files for deployment to production can be slow because of slow network or storage read operations. These challenges are prevalent when using popular frameworks like PyTorch, Ray, or HuggingFace, paired with cloud object storage solutions like S3 or GCS, or downloading models from the HuggingFace model hub.

In this presentation, Lu and Siyuan will offer comprehensive insights into improving speed and GPU utilization for model training and serving. You will learn:

The data loading challenges hindering GPU utilization
The reference architecture for running PyTorch and Ray jobs while reading data from S3, with benchmark results of training ResNet50 and BERT
Real-world examples of boosting model performance and GPU utilization through optimized data access

Watch now

video

AI/ML Infra Meetup | Perspective on Deep Learning Framework

From Caffe to MXNet, to PyTorch, and more, Xiande Cao, Senior Deep Learning Software Engineer Manager, will share his perspective on the evolution of deep learning frameworks.

Watch now

video

AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG

Prefill in LLM inference is known to be resource-intensive, especially for long LLM inputs. While better scheduling can mitigate prefill’s impact, it would be fundamentally better to avoid (most of) prefill. This talk introduces our preliminary effort towards drastically minimizing prefill delay for LLM inputs that naturally reuse text chunks, such as in retrieval-augmented generation. While keeping the KV cache of all text chunks in memory is difficult, we show that it is possible to store them on cheaper yet slower storage. By improving the loading process of the reused KV caches, we can still significantly speed up prefill delay while maintaining the same generation quality.

Watch now

video

AI/ML Infra Meetup | ML explainability in Michelangelo

Uber has numerous deep learning models, most of which are highly complex with many layers and a vast number of features. Understanding how these models work is challenging and demands significant resources to experiment with various training algorithms and feature sets. With ML explainability, the ML team aims to bring transparency to these models, helping to clarify their predictions and behavior. This transparency also assists the operations and legal teams in explaining the reasons behind specific prediction outcomes.

In this talk, Eric Wang will discuss the methods Uber used for explaining deep learning models and how we integrated these methods into the Uber AI Michelangelo ecosystem to support offline explaining.

Watch now

video

Simplify Data Access for AI in Multi-Cloud

Running AI/ML workloads in different clouds present unique challenges. The key to a manageable multi-cloud architecture is the ability to seamlessly access data across environments with high performance and low cost.

This webinar is designed for data platform engineers, data infra engineers, data engineers, and ML engineers who work with multiple data sources in hybrid or multi-cloud environments. Chanchan and Bin will guide the audience through using Alluxio to greatly simplify data access and make model training and serving more efficient in these environments.

You will learn:

How to access data in multi-region, hybrid, and multi-cloud like accessing a local file system
How to run PyTorch to read datasets and write checkpoints to remote storage with Alluxio as the distributed data access layer
Real-world examples and insights from tech giants like Uber, AliPay and more

Watch now

video

Cloud-Native Model Training on Distributed Data

Cloud-native model training jobs require fast data access to achieve shorter training cycles. Accessing data can be challenging when your datasets are distributed across different regions and clouds. Additionally, as GPUs remain scarce and expensive resources, it becomes more common to set up remote training clusters from where data resides. This multi-region/cloud scenario introduces the challenges of losing data locality, resulting in operational overhead, latency and expensive cloud costs.

In the third webinar of the multi-cloud webinar series, Chanchan and Shawn dive deep into:

The data locality challenges in the multi-region/cloud ML pipeline
Using a cloud-native distributed caching system to overcome these challenges
The architecture and integration of PyTorch/Ray+Alluxio+S3 using POSIX or RESTful APIs
Live demo with ResNet and BERT benchmark results showing performance gains and cost savings analysis

Watch now

video

Enhancing Python Data Loading in the Cloud for AI/ML

In this presentation, Bin Fan (VP of Open Source @ Alluxio) will address a critical challenge of optimizing data loading for distributed Python applications within AI/ML workloads in the cloud, focusing on popular frameworks like Ray and Hugging Face. Integration of Alluxio’s distributed caching for Python applications is accomplished using the fsspec interface, thus greatly improving data access speeds. This is particularly useful in machine learning workflows, where repeated data reloading across slow, unstable or congested networks can severely affect GPU efficiency and escalate operational costs.

Attendees can look forward to practical, hands-on demonstrations showcasing the tangible benefits of Alluxio’s caching mechanism across various real-world scenarios. These demos will highlight the enhancements in data efficiency and overall performance of data-intensive Python applications. This presentation is tailored for developers and data scientists eager to optimize their AI/ML workloads. Discover strategies to accelerate your data processing tasks, making them not only faster but also more cost-efficient.

Watch now

video

Why a Multi-Cloud Strategy Matters for Your AI Platform

As GenAI and AI continue to transform businesses, scaling these workloads requires optimized underlying infrastructure. A multi-cloud architecture allows organizations to leverage different cloud services to meet diverse workload demands while maximizing efficiency, reducing costs, and avoiding vendor lock-in. However, achieving a multi-cloud vision can be challenging.

In this webinar, Tarik will share how an agonistic data layer, like Alluxio, allows you to embrace the separation of storage from compute and simplify the adoption of multi-cloud for AI.

Learn why leveraging multiple cloud providers is critical for balancing performance, scalability, and cost of your AI platform
Discover how an agnostic data layer like Alluxio provides seamless data access in multi-cloud that bridges storage and compute without data replication
Gain insights into real-world examples and best practices for deploying AI across on-prem, hybrid, and multi-cloud environments

Watch now