Alluxio Community Newsletter

2023 Recap: Data & AI Digest

ALL THINGS AI

We’ve compiled a collection of 2023’s most popular content according to our readers. In case you missed anything, here’s your chance to catch up on best practices ebooks, technical blogs, hands on videos, webinars and more!

Building High-performance Data Access Layer for Model Training and Model Serving for LLM | User Blog

Mengyu Hu and Chengkun Jia, both from Zhihu’s data platform team, discuss their evolution from HDFS to Alluxio as a high-performance data access layer for LLM training and serving. Alluxio has accelerated model training by 2~3x, increased GPU utilization to 90%, and enabled model deployment every minute instead of hours or days.

Read Now

Efficient Data Access Strategies For Large-scale AI | Whitepaper

Get a comprehensive understanding of data access patterns in a modern AI/ML platform. This white paper discusses the characteristics of data access in each stage of the machine learning pipeline and the solutions that can be used in architecting your data and AI platform.

Read Now

Maximize GPU Utilization for Model Training | On-demand Webinar

When training models on ultra-large datasets, one of the biggest challenges is low GPU utilization. These powerful processors are often underutilized due to inefficient I/O and data access. This mismatch between computation and storage leads to wasted GPU resources, low performance, and high cloud storage costs. The rise of generative AI and GPU scarcity is only making this problem worse.

In this webinar, Tarik and Beinan discuss strategies for transforming idle GPUs into optimal powerhouses. They will focus on cost-effective management of ultra-large datasets for AI and analytics.

Watch Now

End-to-End Machine Learning Pipeline with Alluxio | 3min Demo

Watch the Alluxio Enterprise AI end-to-end ML pipeline demo, and see for yourself the significant performance improvements as well as increased GPU utilization! Alluxio’s Solution Engineer Tarik Bennett walks through a short end-to-end machine learning pipeline with Alluxio provisioned or mounted as a local folder for PyTorch dataloader.

Watch Now

Solving the Data Loading Challenge for Machine Learning with Alluxio | 3min Demo

Alluxio’s Senior Solutions Engineer Roland Theron shares how Alluxio benefits model training workflows by reducing data loading times, allowing for better utilization of your compute resources.

Watch Now

PyTorch Model Training Performance Tuning: A Comprehensive Guide | Ebook | Top tips to boost your training speed by 5-10x

Discover the easily consumed tuning tips that deliver optimal training speeds at lower costs. Learn how to tune PyTorch performance to achieve lower latency and higher GPU utilization through data loading, data operations, GPU processing, and CPU processing, with lines of code.

Read Now

Rise of the Data Access Layer for Analytics & AI | Analyst Research

Explore the transformative capabilities of the Data Access Layer and how it can simplify and accelerate your analytics and AI workloads. in this new research paper, Kevin Petrie, VP of Research at Eckerson Group, shares the architecture and use cases for a Data Access Layer and how it can help achieve analytics and AI goals with successful performance.

Read Now

Cost Savings and Optimization

Millions Saved Annually: Unleashing the Power of Alluxio + HDFS at Uber | User Blog

Find out details of our joint project with Uber aimed at optimizing the performance of HDFS DataNodes. The project utilized the Alluxio SDK cache to manage an SSD storage on each DataNode, resulting in improved performance and a better return on investment. Despite the SSD cache occupying only 0.6% of the total disk space, it impressively handles 60% of the overall client traffic.

Read Now

The Ultimate Guide to Saving Data Egress Costs in the Cloud | Ebook

Build your data platform with reduced cloud egress costs and never be surprised by a bill again. Minimize your data replication with optimized data pipelines and data flow for your architecture.

Read Now

Shopee: Query Acceleration &b Data Access as a Service | User Blog

Learn how Shopee, the leading e-commerce platform in Asia, has successfully leveraged Alluxio to improve Trino query performance by ~55%. In addition, Alluxio enhances developer experience by providing flexible data access through Data APIs.

Read Now

The Trino Optimization Handbook | Ebook | Best Practices and Tuning Tips

Unlock the full potential of Trino and transform your data analytics game. Identify bottlenecks and maximize your Trino query performance with configuration settings and session properties.

If you’re using Presto (PrestoDB), check out The Presto Optimization Handbook here.

Read Now

Got a tech question for the Alluxio Community? Chat with us on Slack!

Be our stargazers on GitHub ⭐

If you like our product, please give it a star on GitHub, and share the goodness!

‍

Slack is our main hub to receive technical support as you use Alluxio and to stay up to date with our latest news and events

Join Slack

We host monthly in-person and online events, come meet the Alluxio team and indulge in technical discussions with data and AI/ML enthusiasts

See upcoming events

We welcome you to contribute to the Alluxio Open Source project!

Contribute

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer

Request a demo