On-Demand Videos

video

AI/ML Infra Meetup | Open Source Michelangelo: Uber's Predictive to Generative end to end ML Lifecycle management platform

In this talk, Eric Wang, Senior Staff Software Engineer introduces Uber’s open-source generative end-to-end ML lifecycle management platform: Michelangelo.

Watch now

video

AI/ML Infra Meetup | Unlock the Future of Generative AI: TorchTitan's Latest Breakthroughs

In this talk, Jiani Wang, Software Engineer Meta's Pytorch Team, dives into the overview and the latest advancements in TorchTitan.

Watch now

video

AI/ML Infra Meetup | Bringing Data to GPUs Anywhere + Get Low-Latency on Object Store with Alluxio

In this talk, Bin Fan, VP of Technology at Alluxio, explores how to enable efficient data access across distributed GPU infrastructure, achieving low-latency performance for feature stores and RAG workloads.

Watch now

video

AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, Pretraining, & Inference at Scale

In this talk, Bin Fan shares his insights on data access challenges in ML applications, with particular emphasis on how Alluxio's distributed caching helps bridge the gap between storage and compute in preprocessing, pretraining and inference.

Watch now

video

AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale

Watch this video to gain insights onhow Uber manages its Generative AI Gateway, which powers all generative AI applications across the company.

‍

Watch now

video

What’s New in Alluxio AI: 3X Faster Checkpoint File Creation, New Cache Eviction Policies, Python SDK enhancements, and more

Join us to learn about the latest release of Alluxio Enterprise AI. In this webinar, we’ll provide an overview of the new features and capabilities of Alluxio Enterprise AI, built to accelerate AI workloads and maximize GPU utilization.

Key highlights include:

New caching mode accelerates AI checkpoints
Advanced cache eviction policies provide fine-grained control
Python SDK integrations enhance AI framework compatibility
A demo of Alluxio accelerating AI training workloads in AWS

Watch now

video

AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU Workloads in the Cloud

Ready to optimize your AI infra strategy? Watch this on-demand video, where Bin Fan, VP of Technology at Alluxio, will guide you through how to balance cost & performance for GPU/CPU workloads.

‍

Watch now

video

AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack

LLM inference can be huge, particularly, with long contexts. In this on-demand video, Junchen Jiang, Assistant Professor at University of Chicago, presents a 10x solution for long contexts inference: an easy-to-deploy stack over multiple vLLM engines with tailored KV-cache backend.

‍

Watch now

video

AI/ML Infra Meetup | Three Developments in AI Infra

You won't want to miss this talk presented by Robert Nishihara, Co-Founder of Anyscale, which is packed with insights on using Ray to conquer the last-mile challenges in AI deployment.

Watch now

video

Accelerate AI: Alluxio 101

In the rapidly evolving landscape of AI and machine learning, Platform and Data Infrastructure Teams face critical challenges in building and managing large-scale AI platforms. Performance bottlenecks, scalability of the platform, and scarcity of GPUs pose significant challenges in supporting large-scale model training and serving.

In this talk, we introduce how Alluxio helps Platform and Data Infrastructure teams deliver faster, more scalable platforms to ML Engineering teams developing and training AI models. Alluxio’s highly-distributed cache accelerates AI workloads by eliminating data loading bottlenecks and maximizing GPU utilization. Customers report up to 4x faster training performance with high-speed access to petabytes of data spread across billions of files regardless of persistent storage type or proximity to GPU clusters. Alluxio’s architecture lowers data infrastructure costs, increases GPU utilization, and enables workload portability for navigating GPU scarcity challenges.

‍

Watch now

video

AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI

In this talk, Zhe Zhang (NVIDIA, ex-Anyscale) introduced Ray and its applications in the LLM and multi-modal AI era. He shared his perspective on ML infrastructure, noting that it presents more unstructured challenges, and recommended using Ray and Alluxio as solutions for increasingly data-intensive multi-modal AI workloads.

Watch now

video

AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training with NVMe GDS and RDMA

As large-scale machine learning becomes increasingly GPU-centric, modern high-performance hardware like NVMe storage and RDMA networks (InfiniBand or specialized NICs) are becoming more widespread. To fully leverage these resources, it’s crucial to build a balanced architecture that avoids GPU underutilization. In this talk, we will explore various strategies to address this challenge by effectively utilizing these advanced hardware components. Specifically, we will present experimental results from building a Kubernetes-native distributed caching layer, utilizing NVMe storage and high-speed RDMA networks to optimize data access for PyTorch training.

Watch now

video

AI/ML Infra Meetup | Big Data and AI

In this talk, Sandeep Manchem discussed big data and AI, covering typical platform architecture and data challenges. We had engaging discussions about ensuring data safety and compliance in Big Data and AI applications.

Watch now

video

AI/ML Infra Meetup | TorchTitan, One-stop PyTorch native solution for production ready LLM pre-training

TorchTitan is a proof-of-concept for Large-scale LLM training using native PyTorch. It is a repo that showcases PyTorch's latest distributed training features in a clean, minimal codebase.

In this talk, Tianyu will share TorchTitan’s design and optimizations for the Llama 3.1 family of LLMs, spanning 8 billion to 405 billion parameters, and showcase its performance, composability, and scalability.

‍

Watch now

video

Model Training Across Regions and Clouds – Challenges, Solutions and Live Demo

AI training workloads running on compute engines like PyTorch, TensorFlow, and Ray require consistent, high-throughput access to training data to maintain high GPU utilization. However, with the decoupling of compute and storage and with today’s hybrid and multi-cloud landscape, AI Platform and Data Infrastructure teams are struggling to cost-effectively deliver the high-performance data access needed for AI workloads at scale.

Join Tom Luckenbach, Alluxio Solutions Engineering Manager, to learn how Alluxio enables high-speed, cost-effective data access for AI training workloads in hybrid and multi-cloud architectures, while eliminating the need to manage data copies across regions and clouds.

What Tom will share:

AI data access challenges in cross-region, cross-cloud architectures.
The architecture and integration of Alluxio with frameworks like PyTorch, TensorFlow, and Ray using POSIX, REST, or Python APIs across AWS, GCP and Azure.
A live demo of an AI training workload accessing cross-cloud datasets leveraging Alluxio's distributed cache, unified namespace, and policy-driven data management.
MLPerf and FIO benchmark results and cost-savings analysis.

Watch now