Alluxio, the data platform company for all data-driven workloads, hosted the community event “AI Infra Day” on October 25, 2023. This virtual event brought together technology leaders working on AI infrastructure from Uber, Meta, and Intel, to delve into the intricate aspects of building scalable, performant, and cost-effective AI platforms.
Bin Fan, Alluxio’s Chief Architect and VP of Open Source, kicked off the event with welcome remarks shedding light on the pivotal trends shaping the AI infrastructure landscape in the era of generative AI. The key takeaway from his keynote was the importance of AI and machine learning workloads and how quickly they influence the innovation of infrastructure. “As an engineer working on AI infrastructure, you need to catch up very closely with the hardware trend because, there’s a saying that, whenever the hardware capacity or performance has a 10x, you will need to totally re-architect your software or your service”, said Bin, “and there is emerging hardware technology that is improving at a constant speed.”
Then we delved into a diverse range of topics from model lifecycle management to PyTorch APIs and more. Whether you didn’t get to join us virtually or you just want to rewatch your favorite session, we’ve compiled all of the videos and presentations from AI Infra Day in one place. Drill into the topics most relevant to you, from Generative AI to model fine tuning to Alluxio’s distributed caching features and more.
Model Lifecycle Management Quality Assurance at Uber Scale
Machine learning models power Uber’s everyday business. However, developing and deploying a model is not a one-time event but a continuous process that requires careful planning, execution, and monitoring. In this session, Sally (Mihyong) Lee, Senior Staff Engineer & TLM @ Uber, highlights Uber’s practice on the machine learning lifecycle to ensure high model quality.
Accelerate Your Model Training and Serving with Distributed Caching
In this session, Adit Madan, Director of Product Management at Alluxio, presents an overview of using distributed caching to accelerate model training and serving. He explores the requirements of data access patterns in the ML pipeline and offers practical best practices for using distributed caching in the cloud. This session features insights from real-world examples, such as AliPay, Zhihu, and more.
Composable PyTorch Distributed with PT2
In this talk, Wanchao Liang, Software Engineer at Meta Pytorch Team, explores the technology advancements of PyTorch Distributed and dives into the details of how multi-dimensional parallelism is made possible to train Large Language Models by composing different PyTorch native distributed training APIs.
Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kubernetes
This hands-on session discusses best practices for using PyTorch and Alluxio during model training on AWS. Alluxio’s Shawn Sun (Software Engineer) and Lu Qiu (Machine Learning Engineer) provide a step-by-step demonstration of how to use Alluxio on EKS as a distributed cache to accelerate computer vision model training jobs that read datasets from S3. A benchmark comparing data loading duration for Alluxio Fuse, S3FS Fuse, and S3 Boto3 is also given, where Alluxio Fuse is proved to be 5 times faster than S3FS Fuse and >10 times faster than S3 Boto3. This architecture significantly improves the utilization of GPUs from 30% to 90%+, archives ~5x faster training, and lower cloud storage costs.
The Generative AI Market, Intel AI Strategy and Product Update
ChatGPT and other massive models represent an amazing step forward in AI, yet they do not solve real-world business problems. In this session, Jordan Plawner, Global Director of Artificial Intelligence Product Manager and Strategy at Intel, surveys how the AI ecosystem has worked non-stop over this last year to take these all-purpose multi-task models and optimize them so they can be used by organizations to address domain specific problems. He explains these new AI-for-the-real world techniques and methods such as fine tuning and how they can be applied to deliver results which are highly performant with state-of-the-art accuracy while also being economical to build and deploy everywhere to enhance products and services.