AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack

January 23, 2025

Junchen Jiang

Assistant Professor of Computer Science

University of Chicago

LLM inference can be huge, particularly, with long contexts. In this on-demand video, Junchen Jiang, Assistant Professor at University of Chicago, presents a 10x solution for long contexts inference: an easy-to-deploy stack over multiple vLLM engines with tailored KV-cache backend.

‍

Videos:

Presentation Slides:

AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack from Alluxio, Inc.

Videos:

Presentation Slides:

AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack from Alluxio, Inc.

Complete the form below to access the full overview:

Videos

AI/ML Infra Meetup | Open Source Michelangelo: Uber's Predictive to Generative end to end ML Lifecycle management platform

In this talk, Eric Wang, Senior Staff Software Engineer introduces Uber’s open-source generative end-to-end ML lifecycle management platform: Michelangelo.

November 13, 2025

AI/ML Infra Meetup | Unlock the Future of Generative AI: TorchTitan's Latest Breakthroughs

In this talk, Jiani Wang, Software Engineer Meta's Pytorch Team, dives into the overview and the latest advancements in TorchTitan.

November 13, 2025

AI/ML Infra Meetup | Bringing Data to GPUs Anywhere + Get Low-Latency on Object Store with Alluxio

In this talk, Bin Fan, VP of Technology at Alluxio, explores how to enable efficient data access across distributed GPU infrastructure, achieving low-latency performance for feature stores and RAG workloads.

November 13, 2025

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer

Request a demo

Alluxio Enterprise AI

Alluxio Enterprise Data

‍

‍

Videos:

Presentation Slides:

Videos:

Presentation Slides:

Complete the form below to access the full overview:

Videos

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer