Products
Alluxio AI Infra Day 2024
.png)

AI Infra Day | The AI Infra in the Generative AI Era

AI Infra Day | Accelerate Your Model Training and Serving with Distributed Caching

AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale

AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta

AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Update

AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kubernetes


White Paper

White Paper
Optimizing I/O for AI Workloads in Geo-Distributed GPU Clusters
Building a reliable, high-performance AI/ML infrastructure can be challenging, especially with constrained budget in a multi-GPU world: infrastructure teams have to leverage GPUs wherever they are available. This requires moving data across regions and clouds, which further leads to slow, complex, and expensive remote data access challenges. This white paper introduces the common causes of slow AI workloads and low GPU utilization, how to diagnose the root cause, and offer solutions to the most common root cause of underutilized GPUs.
GPU Acceleration
Cloud Cost Savings
Hybrid Multi-Cloud


White Paper

White Paper
Meet in the Middle for a 1,000x Performance Boost Querying Parquet Files on Petabyte-Scale Data Lakes
This article introduces how to leverage Alluxio as a high-performance caching and acceleration layer atop hyperscale data lakes for queries on Parquet files. Without using specialized hardware, changing data formats or object addressing schemes, or migrating data from data lakes, Alluxio delivers sub-millisecond Time-to-First-Byte (TTFB) performance comparable to AWS S3 Express One Zone. Furthermore, Alluxio’s throughput scales linearly with cluster size; a modest 50-node deployment can achieve one million queries per second, surpassing the single-account throughput of S3 Express by 50× without latency degradation.
No items found.


Blog

Blog
How Coupang Leverages Distributed Cache to Accelerate Machine Learning Model Training
In a recent Alluxio-hosted virtual tech talk, Hyun Jun Baek, Staff Backend Engineer at Coupang, presented "How Coupang Leverages Distributed Cache to Accelerate ML Model Training." This blog post summarizes key insights from Hyun's presentation on Coupang's approach to distributed caching and how it has transformed their multi-region, hybrid cloud machine learning platform.
GPU Acceleration
Hybrid Multi-Cloud
Model Training Acceleration



On Demand Videos

On Demand Videos
Tech Talk: How Coupang Leverages Distributed Cache to Accelerate ML Model Training
In this tech talk, Hyun Jung Baek, Staff Backend Engineer at Coupang, will share best practices for leveraging distributed caching to power search and recommendation model training infrastructure.
Model Training Acceleration


Blog

Blog
Uptycs Chooses Alluxio to Power GenAI Natural Language Analytics at Terabyte Scale
Suresh Kumar Veerapathiran and Anudeep Kumar, engineering leaders at Uptycs, recently shared their experience of evolving their data platform and analytics architecture to power analytics through a generative AI interface. In their post on Medium titled Cache Me If You Can: Building a Lightning-Fast Analytics Cache at Terabyte Scale, Veerapathiran and Kumar provide detailed insights into the challenges they faced (and how they solved them) scaling their analytics solution that collects and reports on terabytes of telemetry data per day as part of Uptycs Cloud-Native Application Protection Platform (CNAPP) solutions.
Large Scale Analytics Acceleration
.png)

Blog
.png)
Blog
AI/ML Infra Meetup at Uber Seattle: Tackling Scalability Challenges of AI Platforms
Co-hosted by Alluxio and the Uber AI team on March 6, 2025, at Uber's Seattle office and via Zoom, the AI/ML Infra Meetup is a community event for developers focused on building AI, ML, and data infrastructure at scale. Speakers from Uber, Snap, and Alluxio delivered talks, sharing insights and real-world examples about LLM training, fine-tuning, deployment, designing scalable architectures, GPU optimization, and building recommendations systems.
No items found.

Case Study
Case Study
RedNote Accelerates Model Training & Distribution with Alluxio
By leveraging Alluxio Distributed Cache, RedNote eliminated storage bottlenecks causing model training time to exceed SLAs, accelerated cross-cloud model distribution, and lowered model distribution costs
GPU Acceleration
Model Training Acceleration
Model Distribution
Cloud Cost Savings


Case Study

Case Study
Search and Recommendation AI Model Training Acceleration for Top 10 Global E-commerce Giant
Publicly traded, top 10 global e-commerce company leverages Alluxio Enterprise AI to accelerates training of search and recommendation AI Model with Alluxio and cut AWS S3 API and egress charges by over 50%.
Model Training Acceleration
Cloud Cost Savings
Your selections don't match any items.

