Alluxio Blog

How to Become a Contributor to Alluxio Open Source Project

February 8, 2022 By Bin Fan

This is a tutorial to guide a newbie to complete a new-contributor task and become an open-source contributor of the Alluxio project.

Alluxio and Apache Ranger Best Practices

February 2, 2022 By Greg Palmer

As data stewards and security teams provide broader access to their organization’s data lake environments, having a centralized way to manage fine-grained access policies becomes increasingly important. Alluxio can use Apache Ranger’s centralized access policies in two ways: 1) directly controlling access to virtual paths in the Alluxio virtual file system or 2) enforcing existing access policies for the HDFS under stores.

Using Consistent Hashing in Presto to Improve Caching Data Locality in Dynamic Clusters

February 2, 2022 By Rongrong Zhong

Running Presto with Alluxio is gaining popularity in the community. It avoids long latency reading data from remote storage by utilizing SSD or memory to cache hot dataset close to Presto workers. Presto supports hash-based soft affinity scheduling to enforce that only one or two copies of the same data are cached in the entire cluster, which improves cache efficiency by allowing more hot data cached locally. The current hashing algorithm used, however, does not work well when cluster size changes. This article introduces a new hashing algorithm for soft affinity scheduling, consistent hashing, to address this problem.

How to Set Up Monitoring System for Alluxio with Prometheus and Grafana in 10 Minutes

January 31, 2022 By Pan Liu and Hope Wang

This blog will introduce how Tencent uses Prometheus and Grafana to set up monitoring system for Alluxio in 10 minutes.

Thousand-Node Alluxio Cluster Powers Game AI Platform – A Production Case Study from Tencent

January 26, 2022 By Bing Zheng, Baolong Mao and Zhizheng Pan

To provide model training with the best experience, Tencent has implemented a 1000-node Alluxio cluster and designed a scalable, robust, and performant architecture to speed up Ceph storage for game AI training. This blog will give you insight into how Alluxio has been implemented and optimized at Tencent.

A Year with Alluxio Community 2021

January 20, 2022 By Bin Fan and Jasmine Wang

2021 marked accelerated growth for the Alluxio Open Source Project. We could not be more grateful for what the community has achieved together in this past year. This blog provides a glimpse of the year long summary of our community growth.

Machine Learning Model Training with Alluxio: Part 3 – Benchmarking

January 18, 2022 By Lu Qiu and Bin Fan

This blog is the last one in the machine learning series. Our first blog introduced the what and why of our solution, and the second blog compared traditional and Alluxio solutions. This blog will demonstrate how to set up and benchmark the end-to-end performance of the training process.