AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
January 23, 2025
By 
Junchen Jiang
Assistant Professor of Computer Science
University of Chicago

LLM inference can be huge, particularly, with long contexts. In this on-demand video, Junchen Jiang, Assistant Professor at University of Chicago, presents a 10x solution for long contexts inference: an easy-to-deploy stack over multiple vLLM engines with tailored KV-cache backend.

Videos:
Presentation Slides:

Complete the form below to access the full overview:

Videos

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer