Metadata synchronization (sync) is a core feature in Alluxio that keeps files and directories consistent with their source of truth in under storage systems, thus making it simple for users to reason the data retrieved from Alluxio. Meanwhile, understanding the internal process is important in order to tune the performance. This article describes the design and the implementation in Alluxio to keep metadata synchronized.
welcome to the alluxio community
Founder Haoyuan Li was a Ph.D. student at UC Berkeley AMPLab when he built the beginnings of Alluxio (originally called Tachyon) with the mission to orchestrate data for all data driven applications. Today, in addition to running critical workloads for thousands of users across the world, Alluxio has a vibrant community that has made countless contributions to the open source project.
Welcome to Alluxio Community! We would invite you to read about, try out, use, and contribute to Alluxio, as well as to share your experience, feedback, suggestions and ideas!
Product School
Learn how Alluxio uses Apache Ranger’s centralized access policies to control access to virtual paths in the Alluxio virtual file system and enforce existing access policies for the HDFS under stores.

Slack
Ask questions, get answers.

GitHub
Become a contributor.

MAILING LIST
Join the Google group.
Join our channel.

This whitepaper introduces how to speed up end-to-end distributed training in the cloud using Alluxio to accelerate data access. With the help of Alluxio, … Continued
Alluxio is the data orchestration platform to unify data silos across heterogeneous environments. The following blog will discuss the architecture combining Spark with Alluxio.
Unisound is an artificial intelligence company focusing on Internet of Things services. Unisound’s AI technology stacks include the perception and expression capabilities of signals, voices, images, and texts, and the cognitive technologies such as knowledge, understanding, analysis, and decision-making, towards a multi-modal AI system. Atlas is the supercomputing platform supporting all kinds of AI applications including model training and reasoning inferencing.
How T3Go’s high-performance data lake using Apache Hudi and Alluxio shortened the time for data ingestion into the lake by up to a factor of 2. Data analysts using Presto, Hudi, and Alluxio in conjunction to query data on the lake saw queries speed up by 10 times faster.
When applications are only reading and writing through Alluxio, the Alluxio file system provides strong consistency. However, when clients are writing data across both Alluxio and under storage, the consistency depends on the Alluxio write type and under storage type. This article discusses what to expect in each scenario.
Join an Alluxio community event Near You
Don’t see an event in your area? Want to start a local Alluxio meetup? Drop us a note!
ACADEMIC PAPERS

Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks
ACM Symposium on Cloud Computing 2014

Alluxio: A Virtual Distributed File System
Berkeley EECS Ph.D. Dissertation

Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks
ACM Digital Library
join the community
4,000+
Stars

1,000+
Contributors

1 Million+
Downloads

apache 2.0 licensed

Alluxio Contributors
The Alluxio Open Source Contributors and Project Management Committee members come from a diverse and experienced background. The project members includes committers with decades of experience from Tencent, Google, Palantir, UC Berkeley, Carnegie Mellon, IBM, Intel and JD.com.
alluxio uses open source








