Products
Building a Distributed File System For The Cloud-Native Era
November 7, 2022
Big Data Bellevue Meetup
May 19, 2022
Today, data engineering in modern enterprises has become increasingly more complex and resource-consuming, particularly because (1) the rich amount of organizational data is often distributed across data centers, cloud regions, or even cloud providers, and (2) the complexity of the big data stack has been quickly increasing over the past few years with an explosion in big-data analytics and machine-learning engines (like MapReduce, Hive, Spark, Presto, Tensorflow, PyTorch to name a few).
To address these challenges, it is critical to provide a single and logical namespace to federate different storage services, on-prem or cloud-native, to abstract away the data heterogeneity, while providing data locality to improve the computation performance. [Bin Fan] will share his observation and lessons learned in designing, architecting, and implementing such a system – Alluxio open-source project — since 2015.
Alluxio originated from UC Berkeley AMPLab (used to be called Tachyon) and was initially proposed as a daemon service to enable Spark to share RDDs across jobs for performance and fault tolerance. Today, it has become a general-purpose, high-performance, and highly available distributed file system to provide generic data service to abstract away complexity in data and I/O. Many companies and organizations today like Uber, Meta, Tencent, Tiktok, Shopee are using Alluxio in production, as a building block in their data platform to create a data abstraction and access layer. We will talk about the journey of this open source project, especially in its design challenges in tiered metadata storage (based on RocksDB), embedded state-replicate machine (based on RAFT) for HA, and evolution in RPC framework (based on gRPC) and etc.
Meetup Group
Big Data Bellevue: https://www.meetup.com/big-data-bellevue-bdb/
Big Data Bellevue Meetup
May 19, 2022
Today, data engineering in modern enterprises has become increasingly more complex and resource-consuming, particularly because (1) the rich amount of organizational data is often distributed across data centers, cloud regions, or even cloud providers, and (2) the complexity of the big data stack has been quickly increasing over the past few years with an explosion in big-data analytics and machine-learning engines (like MapReduce, Hive, Spark, Presto, Tensorflow, PyTorch to name a few).
To address these challenges, it is critical to provide a single and logical namespace to federate different storage services, on-prem or cloud-native, to abstract away the data heterogeneity, while providing data locality to improve the computation performance. [Bin Fan] will share his observation and lessons learned in designing, architecting, and implementing such a system – Alluxio open-source project — since 2015.
Alluxio originated from UC Berkeley AMPLab (used to be called Tachyon) and was initially proposed as a daemon service to enable Spark to share RDDs across jobs for performance and fault tolerance. Today, it has become a general-purpose, high-performance, and highly available distributed file system to provide generic data service to abstract away complexity in data and I/O. Many companies and organizations today like Uber, Meta, Tencent, Tiktok, Shopee are using Alluxio in production, as a building block in their data platform to create a data abstraction and access layer. We will talk about the journey of this open source project, especially in its design challenges in tiered metadata storage (based on RocksDB), embedded state-replicate machine (based on RAFT) for HA, and evolution in RPC framework (based on gRPC) and etc.
Meetup Group
Big Data Bellevue: https://www.meetup.com/big-data-bellevue-bdb/
Big Data Bellevue Meetup
May 19, 2022
Today, data engineering in modern enterprises has become increasingly more complex and resource-consuming, particularly because (1) the rich amount of organizational data is often distributed across data centers, cloud regions, or even cloud providers, and (2) the complexity of the big data stack has been quickly increasing over the past few years with an explosion in big-data analytics and machine-learning engines (like MapReduce, Hive, Spark, Presto, Tensorflow, PyTorch to name a few).
To address these challenges, it is critical to provide a single and logical namespace to federate different storage services, on-prem or cloud-native, to abstract away the data heterogeneity, while providing data locality to improve the computation performance. [Bin Fan] will share his observation and lessons learned in designing, architecting, and implementing such a system – Alluxio open-source project — since 2015.
Alluxio originated from UC Berkeley AMPLab (used to be called Tachyon) and was initially proposed as a daemon service to enable Spark to share RDDs across jobs for performance and fault tolerance. Today, it has become a general-purpose, high-performance, and highly available distributed file system to provide generic data service to abstract away complexity in data and I/O. Many companies and organizations today like Uber, Meta, Tencent, Tiktok, Shopee are using Alluxio in production, as a building block in their data platform to create a data abstraction and access layer. We will talk about the journey of this open source project, especially in its design challenges in tiered metadata storage (based on RocksDB), embedded state-replicate machine (based on RAFT) for HA, and evolution in RPC framework (based on gRPC) and etc.
Meetup Group
Big Data Bellevue: https://www.meetup.com/big-data-bellevue-bdb/
Video:
Presentation Slides:
Videos:
Presentation Slides:
Complete the form below to access the full overview:
.png)
Videos
Bridging Speed and Scale: AWS S3 Data Caching for Low-Latency, Semantically-Rich AI Workloads

Amazon S3 and other cloud object stores have become the de facto storage system for organizations large and small. And it’s no wonder why. Cloud object stores deliver unprecedented flexibility with unlimited capacity that scales on demand and ensures data durability out-of-the-box at unbeatable prices.
Yet as workloads shift toward real-time AI, inference, feature stores, and agentic memory systems, S3’s latency and limited semantics begin to show their limits. In this webinar, you’ll learn how to augment — rather than replace — S3 with a tiered architecture that restores sub-millisecond performance, richer semantics, and high throughput — all while preserving S3’s advantages of low-cost capacity, durability, and operational simplicity.
We’ll walk through:
- The key challenges posed by latency-sensitive, semantically rich workloads (e.g. feature stores, RAG pipelines, write-ahead logs)
- Why “just upgrading storage” isn’t sufficient — the bottlenecks in metadata, object access latency, and write semantics
- How Alluxio transparently layers on top of S3 to provide ultra-low latency caching, append semantics, and zero data migration with both FSx-style POSIX access and S3 API access
- Real-world results: achieving sub-ms TTFB, 90%+ GPU utilization in ML training, 80X faster feature store query response times, and dramatic cost savings from reduced S3 operations
- Trade-offs, deployment patterns, and best practices for integrating this tiered approach in your AI/analytics stack
October 28, 2025
AI/ML Infra Meetup | AI at scale Architecting Scalable, Deployable and Resilient Infrastructure

Pratik Mishra delivered insights on architecting scalable, deployable, and resilient AI infrastructure at scale. His discussion on fault tolerance, checkpoint optimization, and the democratization of AI compute through AMD's open ecosystem resonated strongly with the challenges teams face in production ML deployments.
September 30, 2025
AI/ML Infra Meetup | Alluxio + S3 A Tiered Architecture for Latency-Critical, Semantically-Rich Workloads

In this talk, Bin Fan, VP of Technology at Alluxio, presents on building tiered architectures that bring sub-millisecond latency to S3-based workloads. The comparison showing Alluxio's 45x performance improvement over S3 Standard and 5x over S3 Express One Zone demonstrated the critical role the performance & caching layer plays in modern AI infrastructure.
September 30, 2025