Blog

Alluxio Blog

The Data-Driven Heartbeat of Artificial Intelligence

This article was initially posted on Solutions Review. Artificial Intelligence (AI) has consistently been in the limelight as the precursor of the next technological era. Its limitless applications, ranging from simple chatbots to intricate neural networks capable of deep learning, promise a future where machines understand and replicate complex human processes. Yet, at the heart of … Continued

GPUs Are Fast, I/O is Your Bottleneck

This article was initially posted on ITOpsTimes. Unless you’ve been living off the grid, the hype around Generative AI has been impossible to ignore. A critical component fueling this AI revolution is the underlying computing power, GPUs. The lightning-fast GPUs enable speedy model training. But a hidden bottleneck can severely limit their potential – I/O. If … Continued

Consistent Hashing in Alluxio DORA

Consistent hashing is a special technique that allows hash rings to be expanded or shrunk dynamically with minimal disruption. Alluxio’s DORA (Decentralized Object Repository Architecture) uses consistent hashing for load balancing when scaling nodes. To reach the goal of fast performance, strict consistency, and load balancing, we analyze, evaluate, and select the most suitable consistent … Continued

Introducing DORA: The Next-generation Alluxio Architecture

Today, we are thrilled to launch the Alluxio Enterprise AI product. One of the key innovations is the introduction of the next-generation architecture DORA – a Decentralized Object Repository Architecture. This blog talks about our development of the DORA architecture, including our motivation, design decisions, and implementation. 1. Moving from Data Analytics to the AI … Continued

A Deep Dive into Caching in Presto

This article was initially posted on InfoWorld. Understand the caching mechanisms for the popular distributed SQL engine and how to use them to improve query speed and efficiency. Presto is a popular, open source, distributed SQL engine that enables organizations to run interactive analytic queries on multiple data sources at a large scale. Caching is a typical optimization … Continued

Alluxio Kubernetes Operator Tutorial: Simplifying Deploying and Managing Alluxio Clusters

This blog provides a tutorial on using the Kubernetes operator to simplify deploying and managing Alluxio clusters on Kubernetes. Introduction The Alluxio Kubernetes operator makes deploying and managing Alluxio and the datasets on Kubernetes easier. With the operator, Alluxio clusters can be deployed and managed seamlessly like any other native Kubernetes application. The operator handles … Continued

Speed Trino Queries with These Performance-Tuning Tips

Originally published at The New Stack: https://thenewstack.io/speed-trino-queries-with-these-performance-tuning-tips/ In this article, we will discuss how data engineers and data infrastructure engineers can make Trino, a widely used query engine that’s faster and more efficient. An open source distributed SQL query engine, Trino is widely used for data analytics on distributed data storage. Optimizing Trino to make it faster … Continued