Alluxio Blog

Millions Saved Annually: Unleashing the Power of Alluxio + HDFS at Uber

May 29, 2023 By Bin Fan, Beinan Wang, Shouwei Chen, Bowen Ding, Jiaming Mai, Jianjian Xie and Hope Wang

In October 2022, Uber’s Presto team shared in a blog post using the Alluxio SDK cache to boost Presto query performance and cost efficiency. This achievement is a major milestone in the collaboration between Alluxio and Uber. Thus far, the Uber Presto team has implemented the Alluxio SDK cache in three production clusters spanning over … Continued

Announcing Our First AI 🤖 PMC Member: CacheGPT

April 1, 2023 By Bin Fan, Yuyang Wang, Beinan Wang and Hope Wang

We are thrilled to announce that CacheGPT, a state-of-the-art natural language generation model, has joined the Alluxio Project Management Committee (PMC) as our newest member! CacheGPT has been an active contributor to Alluxio since the beginning of this year. It reviews pull requests and draft documentation using only emojis! See our new emoji-enriched documentation here! … Continued

Alipay: Optimizing Alluxio for Efficient Large-Scale Training on Billions of Files

March 3, 2023 By Chuanying Chen (Ant Group)

Chuanying Chen, Senior Software Engineer at Ant Group, provides a deep dive into the practices of optimizing Alluxio for reliable, scalable, and high-performance large-scale training on billions of files. 1. Background Ant Group, formerly known as Ant Financial, is an affiliate company of the Chinese conglomerate Alibaba Group. The group owns the world’s largest mobile … Continued

Cross Cluster Synchronization in Alluxio – Part 3: Discussions and Conclusion

February 8, 2023 By Tyler Crain

Following part 1 and part 2, this final blog of the series discusses some design decisions and details, as well as certain future work. Discussions and Future Work Why not exactly once delivery for pub/sub? As we know, exactly once message delivery for pub/sub would greatly simplify our design and there do exist many powerful … Continued

Cross Cluster Synchronization in Alluxio – Part 2: Mechanism

February 8, 2023 By Tyler Crain

This is part 2 of the blog series talking about the design and implementation of the Cross Cluster Synchronization mechanism in Alluxio. In the previous blog, we discussed the scenario, background and how metadata sync is done with a single Alluxio cluster. This blog will describe how metadata sync is built upon to provide metadata … Continued

Cross Cluster Synchronization in Alluxio – Part 1: Scenarios and Background

February 8, 2023 By Tyler Crain

This is a blog series talking about the design and implementation of the Cross Cluster Synchronization mechanism in Alluxio. This mechanism ensures that the metadata is consistent when running multiple Alluxio clusters. Part 1 of this blog series discusses the scenario and background. Alluxio lies in between the storage and compute layers in order to … Continued

“Data Access as a Service” at Shopee: Using Alluxio to Accelerate Interactive Queries and Enhance Developer Experience with Flexible APIs

January 30, 2023 By Tianbao Ding (Shopee) and Haoning Sun (Shopee)

Shopee is the leading e-commerce platform in Southeast Asia. In this blog, Tianbao Ding and Haoning Sun from Shopee’s data infrastructure team share their project on query acceleration and “Data Access as a Service.” They describe how Shopee leverages Alluxio to improve Trino query performance by ~55% and how Alluxio enhances developer experience by providing … Continued

Get Started with Trino and Alluxio in 5 Minutes

January 27, 2023 By Brian Оlsen (Trino Developer Advocate), Beinan Wang and Hope Wang

This blog post discusses the synergy between Trino and Alluxio, and how to deploy Alluxio as the caching layer for Trino. You will learn Why should you choose Alluxio as a cache for Trino How do Trino and Alluxio work together How to configure Alluxio to point to S3 storage like MinIO How to query … Continued