To provide model training with the best experience, Tencent has implemented a 1000-node Alluxio cluster and designed a scalable, robust, and performant architecture to speed up Ceph storage for game AI training. This blog will give you insight into how Alluxio has been implemented and optimized at Tencent.
2021 marked accelerated growth for the Alluxio Open Source Project. We could not be more grateful for what the community has achieved together in this past year. This blog provides a glimpse of the year long summary of our community growth.
This blog is the last one in the machine learning series. Our first blog introduced the what and why of our solution, and the second blog compared traditional and Alluxio solutions. This blog will demonstrate how to set up and benchmark the end-to-end performance of the training process.
This blog is the second in the machine learning series following the previous one, which discussed Alluxio’s solution to improve training performance and simplify data management. With the help of Alluxio, loading data from cloud storage, training and caching data can be done in a transparent and distributed way as a part of the training process, thus improving training performance and simplifying data management. In this blog 2 of the series, we focus on comparing traditional solutions with Alluxio’s.
In this blog, we provide an overview of Alluxio’s AI/ML model training solution. For more details about the reference architecture and benchmarking results, please refer to the full length whitepaper.
Running inference at scale is challenging. In this blog, we will share our observations and the practice to use Alluxio to speed up the I/O performance for large-scale ML/DL offline inference at Microsoft Bing.
Metadata synchronization (sync) is a core feature in Alluxio that keeps files and directories consistent with their source of truth in under storage systems, thus making it simple for users to reason the data retrieved from Alluxio. Meanwhile, understanding the internal process is important in order to tune the performance. This article describes the design and the implementation in Alluxio to keep metadata synchronized.
With this release, Alluxio has strengthened its position as a de-facto data unification and acceleration solution in data analytics and machine learning pipelines. The solution is optimized to support Spark, Presto, Tensorflow, and PyTorch, and is available on multiple cloud platforms such as AWS, GCP, and Azure Cloud, and also on Kubernetes in private data centers or public clouds.
Alluxio is the data orchestration platform to unify data silos across heterogeneous environments. The following blog will discuss the architecture combining Spark with Alluxio.