AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & Serving

Speed and efficiency are two requirements for the underlying infrastructure for machine learning model development. Data access can bottleneck end-to-end machine learning pipelines as training data volume grows and when large model files are more commonly used for serving. For instance, data loading can constitute nearly 80% of the total model training time, resulting in less than 30% GPU utilization. Also, loading large model files for deployment to production can be slow because of slow network or storage read operations. These challenges are prevalent when using popular frameworks like PyTorch, Ray, or HuggingFace, paired with cloud object storage solutions like S3 or GCS, or downloading models from the HuggingFace model hub.

In this presentation, Lu and Siyuan will offer comprehensive insights into improving speed and GPU utilization for model training and serving. You will learn:

The data loading challenges hindering GPU utilization
The reference architecture for running PyTorch and Ray jobs while reading data from S3, with benchmark results of training ResNet50 and BERT
Real-world examples of boosting model performance and GPU utilization through optimized data access

Video:

Presentation slides:

AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & Serving from Alluxio, Inc.

Speaker:

Lu Qiu is a Data & AI Platform Tech Lead at Alluxio and is a PMC maintainer of the open source project Alluxio. Lu develops big data solutions for AI/ML training. Before that, Lu was responsible for core Alluxio components including leader election, journal management, and metrics management. Lu receives an M.S. degree from George Washington University in Data Science.

Siyuan Sheng is a senior software engineer at Alluxio. Previously, he has worked as a Software engineer in Rubrik’s Appflows team. Siyuan received his MS of Computer Science from CMU. He also loves snowboarding during his spare time.