Author: Bin Fan at Alluxio

Building High-Performance Data Lake Using Apache Hudi and Alluxio at T3Go

November 20, 2020 By Trevor Zhang (T3Go), Vino Yang (T3Go), Jasmine Wang and Bin Fan

How T3Go’s high-performance data lake using Apache Hudi and Alluxio shortened the time for data ingestion into the lake by up to a factor of 2. Data analysts using Presto, Hudi, and Alluxio in conjunction to query data on the lake saw queries speed up by 10 times faster.

Data Consistency Model in Alluxio

October 30, 2020 By Baolong Mao, Jasmine Wang and Bin Fan

When applications are only reading and writing through Alluxio, the Alluxio file system provides strong consistency. However, when clients are writing data across both Alluxio and under storage, the consistency depends on the Alluxio write type and under storage type. This article discusses what to expect in each scenario.

Accelerating and Scaling Big Data Analytics with Alluxio and Intel® Optane™ Persistent Memory

May 8, 2020 By Jian Zhang (Intel Corporation), Eugene Ma (Intel Corporation) and Bin Fan

International Data Corporation (IDC) reported that the global datasphere will grow from 33 zettabytes in 2018 to 175 zettabytes by 2025. This trend becomes more and more complicated with the variety and velocity of data growth, and it continuously changes the ways how data is collected, stored, processed and analyzed. New analytics solutions from machine … Continued

Alluxio Accelerates Deep Learning in Hybrid Cloud using Intel’s Analytics Zoo open source platform powered by oneAPI

April 27, 2020 By Bin Fan

This article describes how Alluxio can accelerate the training of deep learning models in a hybrid cloud environment when using Intel’s Analytics Zoo open source platform, powered by oneAPI. Details on the new architecture and workflow, as well as Alluxio’s performance benefits and benchmarks results will be discussed.

What’s new in Alluxio 2.2

March 11, 2020 By Bin Fan, Gene Pang, Zac Blanco and Haoyuan Li

With this release comes the General Availability (GA) of Alluxio Structured Data Services (SDS), the subsystem of Alluxio responsible for managing and transforming structured data, such as databases, tables, and partitions.

Data Orchestration Summit Recap and Highlights!

November 12, 2019 By Amelia Wong and Bin Fan

We are delighted by the success of the inaugural Data Orchestration Summit on Nov. 7, 2019! Organized by Alluxio, this one-day event was sold out with nearly 400 attendees! Data engineers, cloud engineers, data scientists joined the talks of 24 industry leaders from all over the globe to share their experiences building cloud-native data and … Continued

Improving Spark Memory Resource with Off-Heap In-Memory Storage

November 1, 2019 By Bin Fan and Adit Madan

In the previous tutorial ”Getting Started with Spark Caching using Alluxio in 5 Minutes”, we demonstrated how to get started with Spark and Alluxio. To share more thoughts and experiments on how Alluxio enhances Spark workloads, this article focuses on how Alluxio helps to optimize the memory utilization of Spark applications. For users who are … Continued

Tutorial: Presto + Alluxio + Hive Metastore on Your Laptop in 10 min

October 23, 2019 By Bin Fan

This tutorial guides users to set up a stack of Presto, Alluxio and Hive Metastore on your local server, and it demonstrates how to use Alluxio as the caching layer for Presto queries.

Getting Started with EMR Hive on Alluxio in 10 Minutes

October 8, 2019 By Bin Fan

This tutorial describes steps to set up an EMR cluster with Alluxio as a distributed caching layer for Hive, and run sample queries to access data in S3 through Alluxio.

Bin Fan

VP Open Source and Founding Engineer, Alluxio