We are excited to present Alluxio 2.0 to our community. The goal of Alluxio 2.0 was to significantly enhance data accessibility with improved APIs, expand use cases supported to include active workloads as well as better metadata management and availability to support hyperscale deployments. Alluxio 2.0 Preview Release is the first major milestone on this path to Alluxio 2.0 and includes many new features.
In this talk, I will give an overview of the motivations and design decisions behind the major changes in the Alluxio 2.0 release. We will touch on the key features:
– New off-Heap metadata storage leveraging embedded RocksDB to scale up Alluxio to handle a billion files;
– Improved Alluxio POSIX API to support legacy and machine-learning workloads;
– A fully contained, distributed embedded journal system based on RAFT consensus algorithm in high availability mode;
– A lightweight distributed compute framework called “Alluxio Job Service” to support Alluxio operations such as active replication, async-persist, cross mount move/copy and distributed loading;
– Support for mounting and connecting to any number of HDFS clusters of different versions at the same time;
Active file system sync between Alluxio and HDFS as under storage.
Alluxio 2.0 Preview Release Deep Dive
We are excited to present Alluxio 2.0 to our community. The goal of Alluxio 2.0 was to significantly enhance data accessibility with improved APIs, expand use cases supported to include active workloads as well as better metadata management and availability to support hyperscale deployments. Alluxio 2.0 Preview Release is the first major milestone on this path to Alluxio 2.0 and includes many new features.
In this talk, I will give an overview of the motivations and design decisions behind the major changes in the Alluxio 2.0 release. We will touch on the key features:
– New off-Heap metadata storage leveraging embedded RocksDB to scale up Alluxio to handle a billion files;
– Improved Alluxio POSIX API to support legacy and machine-learning workloads;
– A fully contained, distributed embedded journal system based on RAFT consensus algorithm in high availability mode;
– A lightweight distributed compute framework called “Alluxio Job Service” to support Alluxio operations such as active replication, async-persist, cross mount move/copy and distributed loading;
– Support for mounting and connecting to any number of HDFS clusters of different versions at the same time;
Active file system sync between Alluxio and HDFS as under storage.
Video:
Presentation slides:
Real-time Data Processing for Sales Attribution Analysis with Alluxio, Spark and Hive at VIPShop
Vipshop is a leading eCommerce company in China with over 15 million active daily users. Our ETL jobs primarily run against data on HDFS, which can no longer meet the increasing swiftness and stability demand for certain real-time jobs. In this talk, I will explain how we’ve replaced HDFS with Memory+ HDD managed by Alluxio to speed up data accesses for all our Sales Attribution applications running on Spark and Hive, this system has been in production for more than 2 years. As more old fashion ETL SQLs are being converted into real-time jobs, leveraging Alluxio for caching has become one of the widely considered performance tuning solution. I will share our criteria when selecting use cases that can effectively get a boost by switching to Alluxio.
Our future work includes using Alluxio as an abstraction layer for the \tmp\ directory in our main Hadoop clusters, and we are also considering Alluxio to cache the hot data in our 600+ node Presto clusters.
Bio:
Wanchun Wang is the Chief Architect and has been with VIPShop for over 5 years and his interests focus on processing large amounts of data such as building streaming pipelines, optimizing ETL applications, and designing in-house ML & DL platforms. He is currently managing big data teams that are responsible for batch, real-time, and data warehouse systems.
Video:
Acknowledgment:
Our event partner AICamp (http://www.xnextcon.com) is a global online platform for engineers, data scientists to learn and practice AI, ML, DL, Data Science, with 80000+ developers, and 40+ cities local study groups around the world.
We are excited to present Alluxio 2.0 to our community. The goal of Alluxio 2.0 was to significantly enhance data accessibility with improved APIs, expand use cases supported to include active workloads as well as better metadata management and availability to support hyperscale deployments. Alluxio 2.0 Preview Release is the first major milestone on this path to Alluxio 2.0 and includes many new features.
In this talk, I will give an overview of the motivations and design decisions behind the major changes in the Alluxio 2.0 release. We will touch on the key features:
– New off-Heap metadata storage leveraging embedded RocksDB to scale up Alluxio to handle a billion files;
– Improved Alluxio POSIX API to support legacy and machine-learning workloads;
– A fully contained, distributed embedded journal system based on RAFT consensus algorithm in high availability mode;
– A lightweight distributed compute framework called “Alluxio Job Service” to support Alluxio operations such as active replication, async-persist, cross mount move/copy and distributed loading;
– Support for mounting and connecting to any number of HDFS clusters of different versions at the same time;
Active file system sync between Alluxio and HDFS as under storage.
Videos:
Presentation Slides:
Complete the form below to access the full overview:
.png)
Videos

Coupang is a leading e-commerce company in South Korea, with over 50,000 employees and $20+ billion in annual revenue. Coupang's AI platform team builds and manages a large-scale AI platform in AWS for machine learning engineers to train models that enhance and customize product search results and product recommendations for its 100+ million customers.
As the search and recommendation models evolve, optimizing the underlying infrastructure for AI/ML workloads is essential for the e-commerce business. Coupang's platform team actively sought to improve their model training pipeline to boost machine learning engineers' productivity, publish models to production faster, and reduce operational costs.
Coupang focused on addressing several key areas:
- Shortening data preparation and model training time
- Improving GPU utilization in training clusters in different regions
- Reducing S3 API and egress costs incurred from copying large training datasets across regions
- Simplifying the operational complexity of storage system management
In this tech talk, Hyun Jung Baek, Staff Backend Engineer at Coupang, will share best practices for leveraging distributed caching to power search and recommendation model training infrastructure.
Hyun will discuss:
- How Coupang builds a world-class large-scale AI platform for machine learning engineers to deliver better search and recommendation models
- How adding distributed caching to their multi-region AI infrastructure improves GPU utilization, accelerates end-to-end training time, and significantly reduces cross-region data transfer costs.
- How to simplify platform operations and to easily deploy the same architecture to new GPU clusters.
About the Speaker
Hyun Jung Baek is a Staff Backend Engineer at Coupang.
Deepseek’s recent announcement of the Fire-flyer File System (3FS) has sparked excitement across the AI infra community, promising a breakthrough in how machine learning models access and process data.
In this webinar, an expert in distributed systems and AI infrastructure will take you inside Deepseek 3FS, the purpose-built file system for handling large files and high-bandwidth workloads. We’ll break down how 3FS optimizes data access and speeds up AI workloads as well as the design tradeoffs made to maximize throughput for AI workloads.
This webinar you’ll learn about how 3FS works under the hood, including:
✅ The system architecture
✅ Core software components
✅ Read/write flows
✅ Data distribution/placement algorithms
✅ Cluster/node management and disaster recovery
Whether you’re an AI researcher, ML engineer, or infrastructure architect, this deep dive will give you the technical insights you need to determine if 3FS is the right solution for you.