Alluxio Blog

Speed Up Uber’s Presto with Alluxio | A collaboration between Uber and Alluxio – part 1

May 24, 2022 By Chen Liang and Beinan Wang

This article shares how Uber and Alluxio collaborated to design and implement Presto local cache to reduce HDFS latency.

Deep Dive into the Implementation of Alluxio Metadata Storage

May 18, 2022 By Changsheng Gu

This article introduces the design and implementation of metadata storage in Alluxio Master, either on heap and off heap (based on RocksDB).

What’s New in Alluxio 2.8: Enhanced S3 API Functionality, Enterprise-grade Security and Data Migration With Better Usability and Low Cost

May 4, 2022 By Adit Madan and Hope Wang

The Alluxio 2.8 version focuses on the S3 API, enterprise-grade security, scalability and observability in data migration. Enhanced S3 API makes managing Alluxio easier than ever. Features such as encryption at rest and policy-driven data management further improve Alluxio’s functionality to support enterprise customers.

From Zookeeper to Raft: How Alluxio Stores File System State with High Availability and Fault Tolerance

April 13, 2022 By Tyler Crain

Raft is an algorithm for state machine replication as a way to ensure high availability (HA) and fault tolerance. This blog shares how Alluxio has moved to a Zookeeper-less, built-in Raft-based journal system as a HA implementation.

Recommendations to Level Up Your Machine Learning Platform

April 12, 2022 By Bin Fan

With machine learning (ML) and artificial intelligence (AI) applications becoming more business-critical, organizations are in the race to advance their AI/ML capabilities. To realize the full potential of AI/ML, having the right underlying machine learning platform is a prerequisite.

Orchestrating Data for Machine Learning Pipelines

April 8, 2022 By Bin Fan

This article will discuss a new solution to orchestrating data for end-to-end machine learning pipelines that addresses the above questions. I will outline common challenges and pitfalls, followed by proposing a new technique, data orchestration, to optimize the data pipeline for machine learning.

From Cache to Cash: Introducing NFT for Data Orchestration

April 1, 2022 By Bin Fan and Hope Wang

Today, we are excited to announce the launch of Non-fungible token (NFT) as a new feature in our leading data orchestration platform.

Improving Presto Architectural Decisions with Alluxio Shadow Cache at Meta (Facebook)

March 30, 2022 By Ke Wang and Zhenyu Song

With the collaboration between Meta (Facebook), Princeton University, and Alluxio, we have developed “Shadow Cache” – a lightweight Alluxio component to track the working set size and infinite cache hit ratio. Shadow cache can keep track of the working set size over the past window dynamically and is implemented by a series of bloom filters. Shadow cache is deployed in Meta (Facebook) Presto and is being leveraged to understand the system bottleneck and help with routing design decisions.

Accelerate Auto Data Tagging with Alluxio and Spark in Hybrid Cloud – A Practice in WeRide

March 14, 2022 By Feifei Cai and Hao Zhu

This blog shares the practice of using Alluxio and Spark to accelerate the auto data tagging system in WeRide, an autonomous driving technology company.