AI/ML Infra Meetup | Best Practice for LLM Serving in the Cloud
June 17, 2025
By 
Nilesh Agarwal
Co-founder & CTO @ Infereless

Nilesh Agarwal, Co-founder & CTO at Inferless, shares insights on accelerating LLM inference in the cloud using Alluxio, tackling key bottlenecks like slow model weight loading from S3 and lengthy container startup time. Inferless uses Alluxio as a three-tier cache system that dramatically cuts model load time by 10x.

Nilesh Agarwal, Co-founder & CTO at Inferless, shares insights on accelerating LLM inference in the cloud using Alluxio, tackling key bottlenecks like slow model weight loading from S3 and lengthy container startup time. Inferless uses Alluxio as a three-tier cache system that dramatically cuts model load time by 10x.

Videos:
Presentation Slides:

Complete the form below to access the full overview:

Videos

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer