Alluxio Blog

Machine Learning Model Training with Alluxio: Part 2 – Comparable Analysis

January 12, 2022 By Lu Qiu and Bin Fan

This blog is the second in the machine learning series following the previous one, which discussed Alluxio’s solution to improve training performance and simplify data management. With the help of Alluxio, loading data from cloud storage, training and caching data can be done in a transparent and distributed way as a part of the training process, thus improving training performance and simplifying data management. In this blog 2 of the series, we focus on comparing traditional solutions with Alluxio’s.

Machine Learning Model Training with Alluxio: Part 1 – Solution Overview

January 6, 2022 By Lu Qiu, Bin Fan and Hope Wang

In this blog, we provide an overview of Alluxio’s AI/ML model training solution. For more details about the reference architecture and benchmarking results, please refer to the full length whitepaper.

Speed up Large-scale ML/DL Offline Inference Jobs with Alluxio at Microsoft Bing

January 6, 2022 By Binyang Li and Qianxi Zhang

Running inference at scale is challenging. In this blog, we will share our observations and the practice to use Alluxio to speed up the I/O performance for large-scale ML/DL offline inference at Microsoft Bing.

Metadata Synchronization in Alluxio: Design, Implementation and Optimization

December 14, 2021 By David Zhu

Metadata synchronization (sync) is a core feature in Alluxio that keeps files and directories consistent with their source of truth in under storage systems, thus making it simple for users to reason the data retrieved from Alluxio. Meanwhile, understanding the internal process is important in order to tune the performance. This article describes the design and the implementation in Alluxio to keep metadata synchronized.

What’s New in Alluxio 2.7: Enhanced Scalability, Stability and Major Improvements in AI/ML Training Efficiency

November 16, 2021 By Adit Madan and Hope Wang

With this release, Alluxio has strengthened its position as a de-facto data unification and acceleration solution in data analytics and machine learning pipelines. The solution is optimized to support Spark, Presto, Tensorflow, and PyTorch, and is available on multiple cloud platforms such as AWS, GCP, and Azure Cloud, and also on Kubernetes in private data centers or public clouds.

Speeding Up the Atlas Supercomputing Platform with Fluid + Alluxio

November 8, 2021 By Dongdong Lv and Qingsong Liu

Unisound is an artificial intelligence company focusing on Internet of Things services. Unisound’s AI technology stacks include the perception and expression capabilities of signals, voices, images, and texts, and the cognitive technologies such as knowledge, understanding, analysis, and decision-making, towards a multi-modal AI system. Atlas is the supercomputing platform supporting all kinds of AI applications including model training and reasoning inferencing.

Alluxio Use Cases Overview: Unify silos with Data Orchestration

October 19, 2021 By Hope Wang, Adit Madan and Bin Fan

This blog is the first in a series introducing Alluxio as the data platform to unify data silos across heterogeneous environments. The next blog will include insights from PrestoDB committer Beinan Wang to uncover the value for analytics use cases, specifically with PrestoDB as the compute engine.

What’s New in Alluxio 2.6: Better Performance for AI/ML Workloads plus Increased Operating Metrics Visibility

July 1, 2021 By Adit Madan

Alluxio 2.6 significantly improves the performance of data-intensive AI/ML workloads across any storage, and also improves the general maintainability and visibility of Alluxio clusters, especially for large-scale deployments. We have taken the feedback and contributions from the community and introduced features which simplify deployment, introduce new data management capabilities, optimize performance, and provide enhanced visibility into system behavior.