AI/ML Infra Meetup at Uber

The Community Event For Developers Building AI/ML/Data Infrastructure At Scale

Thursday May 23, 2024 | Uber’s Sunnyvale Office & Virtual

AI/ML INFRA MEETUP @ UBER

Join leading AI/ML infrastructure experts for the AI/ML Infra Meetup hosted by Alluxio and Uber. This is a premier opportunity to engage and discuss the latest in ML pipeline, AI/ML infrastructure, LLM, RAG, GPU, PyTorch, HuggingFace and more.

​​This meetup will be in person at Uber Sunnyvale and live-streamed. Experts from Uber, NVIDIA, Alluxio and UChicago will give talks and share insights and real-world examples about optimizing data pipelines, accelerating model training and serving, designing scalable architectures, and more.

​​Immerse yourself with learning, networking, and conversations. Enjoy the mix and mingle happy hour in the end. Dinner and drinks are on us!

SPEAKERS

Qiushen Wang

@Uber

Sr Staff Software Engineer

Xiande Cao

@NVIDIA

Deep Learning Software Engineer Manager

Junchen Jiang

@University of chicago

Assistant Professor of Computer Science

Lu Qiu

@Alluxio

Tech Lead

Siyuan Sheng

@Alluxio

Sr Software Engineer

Tarik Bennett

@Alluxio

Sr Solutions Engineer

SCHEDULE-AT-A-GLANCE

See You Soon!

Times are listed in Pacific Daylight Time (PDT). The agenda is subject to change.

4:00pm – 5:00pm Registration & Networking

Uber builds and maintains one of the largest scale ML infrastructure, with over 1000 pipelines daily, for training an extensive number of models being used across various aspects of the business. Such a massive scale of data pipeline poses challenges to the ML platform, including the speed and efficiency of data retrieval and the utilization of both CPU and GPU resources.

In this session, Eric Wang will share how to leverage cache to optimize data access in the model training process on Uber’s ML infrastructure. By enabling shared caching mechanisms, the system reduces redundancy and improves data access times across various projects, leading to a more cohesive and efficient ML ecosystem.

What you will learn:

  • Enhancing data access speed and efficiency
  • Optimize both CPU and GPU utilization by avoiding data stalls in training epochs
  • Shared data infrastructure for collaborative project efficiency
Speakers:
Qiushen Wang is a software engineer at Uber’s Michelangelo team since 2020, focused on maintaining high ML quality across all models and pipelines. Prior to this, he contributed to Uber’s Marketplace Fares team from 2018 to 2020, developing fare systems for various services. Before that, he resided in Australia, and built a strong foundation in software engineering, working with notable companies including eBay, Qantas, and Equifax.

Speed and efficiency are two requirements for the underlying infrastructure for machine learning model development. Data access can bottleneck end-to-end machine learning pipelines as training data volume grows and when large model files are more commonly used for serving. For instance, data loading can constitute nearly 80% of the total model training time, resulting in less than 30% GPU utilization. Also, loading large model files for deployment to production can be slow because of slow network or storage read operations. These challenges are prevalent when using popular frameworks like PyTorch, Ray, or HuggingFace, paired with cloud object storage solutions like S3 or GCS, or downloading models from the HuggingFace model hub.

In this presentation, Lu and Siyuan will offer comprehensive insights into improving speed and GPU utilization for model training and serving. You will learn:

  • The data loading challenges hindering GPU utilization
  • The reference architecture for running PyTorch and Ray jobs while reading data from S3, with benchmark results of training ResNet50 and BERT
  • Real-world examples of boosting model performance and GPU utilization through optimized data access
Speakers:
Lu Qiu is a Data & AI Platform Tech Lead at Alluxio and is a PMC maintainer of the open source project Alluxio. Lu develops big data solutions for AI/ML training. Before that, Lu was responsible for core Alluxio components including leader election, journal management, and metrics management. Lu receives an M.S. degree from George Washington University in Data Science.

Siyuan Sheng is a senior software engineer at Alluxio. Previously, he has worked as a Software engineer in Rubrik’s Appflows team. Siyuan received his MS of Computer Science from CMU. He also loves snowboarding during his spare time.

Prefill in LLM inference is known to be resource-intensive, especially for long LLM inputs. While better scheduling can mitigate prefill’s impact, it would be fundamentally better to avoid (most of) prefill. This talk introduces our preliminary effort towards drastically minimizing prefill delay for LLM inputs that naturally reuse text chunks, such as in retrieval-augmented generation. While keeping the KV cache of all text chunks in memory is difficult, we show that it is possible to store them on cheaper yet slower storage. By improving the loading process of the reused KV caches, we can still significantly speed up prefill delay while maintaining the same generation quality.
Speakers:
Junchen Jiang is an Assistant Professor of Computer Science at the University of Chicago. He received his Ph.D. from CMU in 2017 and his bachelor’s degree from Tsinghua in 2011. His research interests are networked systems and their intersections with machine learning. He has received a Google Faculty Research Award, an NSF CAREER Award, and a CMU Computer Science Doctoral Dissertation Award. https://people.cs.uchicago.edu/~junchenj/

From Caffe to MXNet, to PyTorch, and more, Xiande Cao, Senior Deep Learning Software Engineer Manager, will share his perspective on the evolution of deep learning frameworks.
Speakers:
Dr. Xiande (Triston) Cao is a Senior Deep Learning Software Engineering Manager at NVIDIA. He collaborates with the open-source community working on deep learning and graph neural networks, leveraging the NVIDIA software stack, GPUs, and AI systems to enhance the capabilities of AI. He received his PhD in Electric Engineering from the University of Kentucky.

6:20pm – 7:30pm Happy Hour | Food and drinks are on us!