Alluxio Community Newsletter - July 2023

Highlights

Zhihu Case Study Blog | Building High-performance Data Access Layer for Model Training and Model Serving for LLM

Mengyu Hu and Chengkun Jia, both from Zhihu’s data platform team, discuss their evolution from HDFS to Alluxio as a high-performance data access layer for LLM training and serving. Alluxio has accelerated model training by 2~3x, increased GPU utilization to 90%, and enabled model deployment every minute instead of hours or days.

Read Case Study

Alluxio Enterprise Release | 2.10 is here!

We are thrilled to announce the release of Alluxio Enterprise 2.10! We have made significant progress in improving high availability and reducing resource consumption. You can now scale Alluxio with improved speed at lower cost.

Read Product Blog

Great Things with Great Tech Podcast | Fast and Efficient Hybrid Data Access with Alluxio

Listen to this latest podcast to learn about how the field of analytics and AI has been changing, the key challenges of and different approaches to addressing the needs of the data platform.

Listen to Prodcast

TDWI article | Executive Q&A on Controlling Cloud Egress Costs

As enterprises move to the cloud, many are getting sticker shock from out of control cloud costs. This Q&A covers several best practices that can help reduce egress costs as businesses scale their cloud usage and continue to evolve their data platform architecture.

Read Article

Mini Video Series

We have new videos releasing every 2 weeks. Subscribe to our channel and stay tuned!

Getting Started with Alluxio on Kubernetes

Getting Started with Alluxio on Kubernetes is complete! Learn about the architecture, deployment and best practices of Alluxio on Kubernetes with Shawn Sun, Software Engineer at Alluxio.

Expedia Group’s User Journey with Alluxio

We have the latest mini video series coming out! Explore Expedia Group’s data landscape, see why data replication was not the right solution, and learn how Expedia Group reduced egress costs by unifying cross-region access in the cloud.

Part I | Explore Expedia Group’s Data Landscape

Past events on-demand

On-demand Webinar | Maximize GPU Utilization for Model Training

When training models on ultra-large datasets, one of the biggest challenges is low GPU utilization. These powerful processors are often underutilized due to inefficient I/O and data access. This mismatch between computation and storage leads to wasted GPU resources, low performance, and high cloud storage costs. The rise of generative AI and GPU scarcity is only making this problem worse.

In this webinar, Tarik and Beinan discuss strategies for transforming idle GPUs into optimal powerhouses. They will focus on cost-effective management of ultra-large datasets for AI and analytics.

Watch Now

June was a month packed full of talks! Take a look at what our team has been up to:

Presto Con Day
- Speeding Up Presto in ByteDance – Shengxuan Liu, Bytedance & Beinan Wang, Alluxio
- Presto on ARM – Chunxu Tang & Jiaming Mai, Alluxio
Trino Fest | Trino Optimization With Distributed Caching on Data Lake – Beinan Wang & Hope Wang, Alluxio
Data + AI Summit | Data Caching Strategies for Data Analytics and AI – Beinan Wang & Chunxu Tang

Upcoming Events

[New Weekly Event] | Alluxio PR Power

We have a new WEEKLY event called Alluxio PR Power Hour! Get live feedback on your Github PRs/Issues or join to learn about what others are working on. Every Thursday 8pm PDT // Friday 11am CST. You can find more details and past meeting notes here.

TDWI Webinar | Laying the Groundwork for AI: Addressing Infrastructure Hurdles for Optimal Model Training | July 25 9:00am PT

By David Loshin | President of Knowledge Integrity

In today’s competitive landscape, companies are eager to harness the power of AI for competitive advantage. However, efforts to effectively access and utilize GPUs often lead to extensive data engineering managing data copies or specialized storage leading to out of control cloud and infra costs. Join us for this TDWI webinar to learn more about the infrastructure hurdles associated with AI/ML model training and deployment and how to overcome these challenges. Topics include:

The challenges of AI and model training
GPU utilization in machine learning and the need for specialized hardware
Managing data access and maintaining a source of truth in data lakes
Best practices for optimizing ML training

Register Now

Got a tech question for the Alluxio Community? Chat with us on Slack!

WHITEPAPERS

“Zero-Copy” Hybrid Bursting with no App Changes

Alluxio Architecture and Data Flow

Evaluating Apache Spark and Alluxio for Data Analytics Benchmarking Recommendations and Results

Spark with Alluxio Overview – Pair Spark with Alluxio to Modernize Your Data Platform

Presto with Alluxio Overview – Architecture Evolution for Interactive Queries

Accelerating Machine Learning / Deep Learning in the Cloud: Architecture and Benchmark

Be our stargazers on GitHub ⭐

If you like our product, please give it a star on GitHub, and share the goodness!

HOT JOBS

We currently have 30+ opportunities across the globe! Learn more about our job openings in Customer Success, Sales, Product, and Engineering teams. Are you awesome or know of anyone to refer? Check out the full list of opportunities and apply here.

Senior Account Support Engineer (San Mateo, California)

Senior Solutions Engineer (San Mateo, California)

Senior Account Executive (San Mateo, California)

Software Engineering Manager (San Mateo, California)