Alluxio Blog

GPUs Are Fast, I/O is Your Bottleneck

November 7, 2023 By Hope Wang

This article was initially posted on ITOpsTimes. Unless you’ve been living off the grid, the hype around Generative AI has been impossible to ignore. A critical component fueling this AI revolution is the underlying computing power, GPUs. The lightning-fast GPUs enable speedy model training. But a hidden bottleneck can severely limit their potential – I/O. If … Continued

Consistent Hashing in Alluxio DORA

October 31, 2023 By Jiaming Mai

Consistent hashing is a special technique that allows hash rings to be expanded or shrunk dynamically with minimal disruption. Alluxio’s DORA (Decentralized Object Repository Architecture) uses consistent hashing for load balancing when scaling nodes. To reach the goal of fast performance, strict consistency, and load balancing, we analyze, evaluate, and select the most suitable consistent … Continued

Introducing DORA: The Next-generation Alluxio Architecture

October 18, 2023 By Beinan Wang, Bin Fan, Bowen Ding, Jiaming Mai, Hua Huang, Lu Qiu, Jianjian Xie, Shawn Sun, Lucy Ge, Chunxu Tang, Kai Zhang and Hope Wang

Today, we are thrilled to launch the Alluxio Enterprise AI product. One of the key innovations is the introduction of the next-generation architecture DORA – a Decentralized Object Repository Architecture. This blog talks about our development of the DORA architecture, including our motivation, design decisions, and implementation. 1. Moving from Data Analytics to the AI … Continued

Introducing Alluxio Enterprise AI and A Vision Beyond Unintelligent Storage

October 18, 2023 By Adit Madan, Bin Fan and Haoyuan Li

We take great pride in the Alluxio Data Platform serving many of the most critical data-driven applications in the world as we speak today. Each of us interact with platforms empowered by Alluxio on a daily basis, and unknowingly you are as well. From the voice assistant we speak to, the bank we transact with, … Continued

A Deep Dive into Caching in Presto

October 11, 2023 By Hope Wang and Beinan Wang

This article was initially posted on InfoWorld. Understand the caching mechanisms for the popular distributed SQL engine and how to use them to improve query speed and efficiency. Presto is a popular, open source, distributed SQL engine that enables organizations to run interactive analytic queries on multiple data sources at a large scale. Caching is a typical optimization … Continued

A Deep Dive into the Call Chain Relationship Between Presto, Hive, and Alluxio

September 11, 2023 By Jiaming Mai

Alluxio is commonly used with Presto and Hive to accelerate queries. Understanding how Presto+Hive+Alluxio work together and the flow from SQL query to low-level file system operations is key to tuning performance. This post will dive into the relationship between Presto, Hive, and Alluxio. We will walk you through how a SQL query executes in … Continued

Alluxio Kubernetes Operator Tutorial: Simplifying Deploying and Managing Alluxio Clusters

August 14, 2023 By Shawn Sun, Beinan Wang and Hope Wang

This blog provides a tutorial on using the Kubernetes operator to simplify deploying and managing Alluxio clusters on Kubernetes. Introduction The Alluxio Kubernetes operator makes deploying and managing Alluxio and the datasets on Kubernetes easier. With the operator, Alluxio clusters can be deployed and managed seamlessly like any other native Kubernetes application. The operator handles … Continued

Speed Trino Queries with These Performance-Tuning Tips

August 2, 2023 By Hope Wang and Beinan Wang

Originally published at The New Stack: https://thenewstack.io/speed-trino-queries-with-these-performance-tuning-tips/ In this article, we will discuss how data engineers and data infrastructure engineers can make Trino, a widely used query engine that’s faster and more efficient. An open source distributed SQL query engine, Trino is widely used for data analytics on distributed data storage. Optimizing Trino to make it faster … Continued

Top Tips and Tricks for PyTorch Model Training Performance Tuning [2023]

July 22, 2023 By Hope Wang, Beinan Wang and Chunxu Tang

Get the latest and greatest tips to accelerate your PyTorch model training for machine learning and deep learning. PyTorch, an open-source machine learning framework, has become the de facto choice for many organizations to develop and deploy deep learning models. Model training is the most compute-intensive phase of the machine learning pipeline. It requires continuous … Continued