Mengyu Hu and Chengkun Jia, both from Zhihu’s data platform team, discuss their evolution from HDFS to Alluxio as a high-performance data access layer for LLM training and serving. Alluxio has accelerated model training by 2~3x, increased GPU utilization to 90%, and enabled model deployment every minute instead of hours or days.
Listen to this latest podcast to learn about how the field of analytics and AI has been changing, the key challenges of and different approaches to addressing the needs of the data platform.
As enterprises move to the cloud, many are getting sticker shock from out of control cloud costs. This Q&A covers several best practices that can help reduce egress costs as businesses scale their cloud usage and continue to evolve their data platform architecture.
Mini Video Series
We have new videos releasing every 2 weeks. Subscribe to our channel and stay tuned!
Getting Started with Alluxio on Kubernetes is complete! Learn about the architecture, deployment and best practices of Alluxio on Kubernetes with Shawn Sun, Software Engineer at Alluxio.
We have the latest mini video series coming out! Explore Expedia Group’s data landscape, see why data replication was not the right solution, and learn how Expedia Group reduced egress costs by unifying cross-region access in the cloud.
Past events on-demand
When training models on ultra-large datasets, one of the biggest challenges is low GPU utilization. These powerful processors are often underutilized due to inefficient I/O and data access. This mismatch between computation and storage leads to wasted GPU resources, low performance, and high cloud storage costs. The rise of generative AI and GPU scarcity is only making this problem worse.
In this webinar, Tarik and Beinan discuss strategies for transforming idle GPUs into optimal powerhouses. They will focus on cost-effective management of ultra-large datasets for AI and analytics.
June was a month packed full of talks! Take a look at what our team has been up to:
- Presto Con Day
- Trino Fest | Trino Optimization With Distributed Caching on Data Lake – Beinan Wang & Hope Wang, Alluxio
- Data + AI Summit | Data Caching Strategies for Data Analytics and AI – Beinan Wang & Chunxu Tang
By David Loshin | President of Knowledge Integrity
In today’s competitive landscape, companies are eager to harness the power of AI for competitive advantage. However, efforts to effectively access and utilize GPUs often lead to extensive data engineering managing data copies or specialized storage leading to out of control cloud and infra costs. Join us for this TDWI webinar to learn more about the infrastructure hurdles associated with AI/ML model training and deployment and how to overcome these challenges. Topics include:
- The challenges of AI and model training
- GPU utilization in machine learning and the need for specialized hardware
- Managing data access and maintaining a source of truth in data lakes
- Best practices for optimizing ML training
Got a tech question for the Alluxio Community? Chat with us on Slack!
Be our stargazers on GitHub ⭐
If you like our product, please give it a star on GitHub, and share the goodness!
We currently have 30+ opportunities across the globe! Learn more about our job openings in Customer Success, Sales, Product, and Engineering teams. Are you awesome or know of anyone to refer? Check out the full list of opportunities and apply here.