Alluxio Blog

Deep Learning at Alibaba Cloud with Alluxio – Running PyTorch on HDFS

June 19, 2020 By Yang Che (Alibaba)

Google’s TensorFlow and Facebook’s PyTorch are two Deep Learning frameworks that have been popular with the open source community. Although PyTorch is still a relatively new framework, many developers have successfully adopted it due to its ease of use. By default, PyTorch does not support Deep Learning model training directly in HDFS, which brings challenges … Continued

Efficient Model Training in the Cloud with Kubernetes, TensorFlow, and Alluxio

May 22, 2020 By Rong Gu (Nanjing University) and Yang Che (Alibaba)

A collaboration of Alibaba, Alluxio, and Nanjing University in tackling the problems of Deep Learning model training in the cloud. Our goal was to reduce the cost and complexity of data access for Deep Learning training in a hybrid environment, which resulted in over 40% reduction in training time and cost.

Accelerating and Scaling Big Data Analytics with Alluxio and Intel® Optane™ Persistent Memory

May 8, 2020 By Jian Zhang (Intel Corporation), Eugene Ma (Intel Corporation) and Bin Fan

International Data Corporation (IDC) reported that the global datasphere will grow from 33 zettabytes in 2018 to 175 zettabytes by 2025. This trend becomes more and more complicated with the variety and velocity of data growth, and it continuously changes the ways how data is collected, stored, processed and analyzed. New analytics solutions from machine … Continued

Alluxio Accelerates Deep Learning in Hybrid Cloud using Intel’s Analytics Zoo open source platform powered by oneAPI

April 27, 2020 By Bin Fan

This article describes how Alluxio can accelerate the training of deep learning models in a hybrid cloud environment when using Intel’s Analytics Zoo open source platform, powered by oneAPI. Details on the new architecture and workflow, as well as Alluxio’s performance benefits and benchmarks results will be discussed.

Everything you want to know about how to decouple SQL engines from Hive Data Warehouse

March 30, 2020 By Gene Pang

Are you using SQL engines, such as Presto, to query existing Hive data warehouse and experiencing challenges including overloaded Hive Metastore with slow and unpredictable access, unoptimized data formats and layouts such as too many small files, or lack of influence over the existing Hive system and other Hive applications?

Serving Structured Data in Alluxio: Example

March 11, 2020 By Gene Pang

This article goes through a simple example to illustrate how Structured Data Management available in the latest Alluxio 2.2.0 release to help SQL and structured data workloads.

Serving Structured Data in Alluxio: Concept

March 11, 2020 By Gene Pang

This article introduces Structured Data Management available in the latest Alluxio 2.2.0 release, a new effort to provide further benefits to SQL and structured data workloads using Alluxio.

What’s new in Alluxio 2.2

March 11, 2020 By Bin Fan, Gene Pang, Zac Blanco and Haoyuan Li

With this release comes the General Availability (GA) of Alluxio Structured Data Services (SDS), the subsystem of Alluxio responsible for managing and transforming structured data, such as databases, tables, and partitions.

Kubernetes, Alluxio and the Disaggregated Analytics Stack

November 20, 2019 By Dipti Borkar

Kubernetes, Alluxio and the disaggregated analytics stack TL;DR: First the news – Alluxio support for K8s Helm charts now available! K8s is a certified environment for Alluxio. Now the take away- Alluxio brings back data locality for the disaggregated analytics stack in K8s. How? Read on. There’s no arguing the rise of containers in real-world … Continued