Resources

Blog

Blog

Accelerating Analytics by 200% with Impala, Alluxio, and HDFS at Tencent

In this article, Honghan Tian describes how engineers in the Data Service Center (DSC) at Tencent PCG (Platform and Content Business Group) leverages Alluxio to optimize the analytics performance and minimize the operating costs in building Tencent Beacon Growing, a real-time data analytics platform.

On Demand Videos

On Demand Videos

Tech Talk: Build a hybrid data lake and burst processing to Google Cloud Dataproc with Alluxio

Blog

Blog

Efficient Model Training in the Cloud with Kubernetes, TensorFlow, and Alluxio

This article presents the collaboration of Alibaba, Alluxio, and Nanjing University in tackling the problem of Deep Learning model training in the cloud. Various performance bottlenecks are analyzed with detailed optimizations of each component in the architecture. Our goal was to reduce the cost and complexity of data access for Deep Learning training in a hybrid environment, which resulted in over 40% reduction in training time and cost.

‍

White Paper

White Paper

Using Alluxio to Optimize and Improve Performance of Kubernetes-Based Deep Learning in the Cloud

Featuring Alibaba Cloud Container Service Team Case Study

Blog

Blog

Accelerating and Scaling Big Data Analytics with Alluxio and Intel Optane Persistent Memory

White Paper

White Paper

Alluxio Accelerates Deep Learning in Hybrid Cloud using Intel’s Analytics Zoo open source platform powered by oneAPI

On Demand Videos

On Demand Videos

Burst Presto & Spark workloads to AWS EMR with no data copies

ALLUXIO COMMUNITY OFFICE HOUR

Blog

Blog

Alluxio Accelerates Deep Learning in Hybrid Cloud using Intels Analytics Zoo open source platform powered by oneAPI

This article describes how Alluxio can accelerate the training of deep learning models in a hybrid cloud environment when using Intel’s Analytics Zoo open source platform, powered by oneAPI. Details on the new architecture and workflow, as well as Alluxio’s performance benefits and benchmarks results will be discussed.

White Paper

White Paper

Get Insights Faster with Alluxio and Intel

On Demand Videos

On Demand Videos

Scalable and Highly-available Distributed File System Metadata Service Using gRPC, RocksDB and RAFT

ALLUXIO COMMUNITY OFFICE HOUR

White Paper

White Paper

“Zero-Copy” Hybrid Cloud for Data Analytics – Strategy, Architecture and Benchmark Report

Blog

Blog

Everything you want to know about how to decouple SQL engines from Hive Data Warehouse

Are you using SQL engines, such as Presto, to query existing Hive data warehouse and experiencing challenges including overloaded Hive Metastore with slow and unpredictable access, unoptimized data formats and layouts such as too many small files, or lack of influence over the existing Hive system and other Hive applications?

On Demand Videos

On Demand Videos

Optimizing Query Performance by Decoupling Presto and Hive Data Warehouse

ALLUXIO COMMUNITY OFFICE HOUR

Blog

Blog

Serving Structured Data in Alluxio Concept

This article introduces Structured Data Management available in the latest Alluxio 2.2.0 release, a new effort to provide further benefits to SQL and structured data workloads using Alluxio.

Blog

Blog

Serving Structured Data in Alluxio Example

This article goes through a simple example to illustrate how Structured Data Management available in the latest Alluxio 2.2.0 release to help SQL and structured data workloads.

Blog

Blog

Whats new in Alluxio 2.2

With this release comes the General Availability (GA) of Alluxio Structured Data Services (SDS), the subsystem of Alluxio responsible for managing and transforming structured data, such as databases, tables, and partitions.

On Demand Videos

On Demand Videos

Bursting Apache Spark Workloads to the Cloud on Remote Data

ALLUXIO COMMUNITY OFFICE HOUR

On Demand Videos

On Demand Videos

Testing Distributed System at Scale for the Cost of a Large Pizza on AWS

ALLUXIO COMMUNITY OFFICE HOUR

On Demand Videos

On Demand Videos

Running Presto with Alluxio on Amazon EMR

ALLUXIO COMMUNITY OFFICE HOUR

‍

White Paper

White Paper

Accelerating analytics & AI in Kubernetes with Alluxio Open Source Data Orchestration

Presentation

Presentation

CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes Using Alluxio

In the on-prem days, one key performance optimization for Apache Hadoop or Apache Spark workloads is to run tasks on nodes with local HDFS data. However, while adoption of the Cloud & Kubernetes makes scaling compute workloads exceptionally easy, HDFS is often not an option. Effectively accessing data from cloud-native storage services like AWS S3 or even on-premises HDFS becomes harder as data locality is lost.

Originated from UC Berkeley AMPLab, the open source project Alluxio approaches this problem in a new way by helping to move data closer to compute workloads efficiently and on-demand, and unify data across multiple or remote clouds, and many more. This webinar will describe the concept and internal mechanism using the stack of Spark+Alluxio in Kubernetes to enhance data locality even when the storage service is outside or remote.

Particularly, we will go over:

Why Spark is able to make a locality-aware schedule when working with Alluxio in K8s environment using the host network
Why a pod running Alluxio can share data efficiently with a pod running Spark on the same host using domain socket and host path volume
The roadmap of Alluxio to further improve running analytics jobs like Spark and Presto, including the on-going closer integration with Presto