Accelerating and Scaling Big Data Analytics with Alluxio and Intel Optane Persistent Memory

May 8, 2020

Bin Fan

International Data Corporation (IDC) reported that the global datasphere will grow from 33 zettabytes in 2018 to 175 zettabytes by 2025. This trend becomes more and more complicated with the variety and velocity of data growth, and it continuously changes the ways how data is collected, stored, processed and analyzed. New analytics solutions from machine learning, deep learning, artificial intelligence (AI), new architectures and new tools are being developed to extract and deliver value from the huge datasphere.

Among those solutions, the compute and storage disaggregated architecture is becoming increasingly attractive as it enables companies to scale storage capacity independently to match the growth rate of compute, which reduces both capital expenditures and operating expenses. However, this disaggregated architecture introduces performance loss for certain types of workloads, our previous work showed compute and storage disaggregation might leads up to 60% performance regression since the commonly used cloud adaptors does not support transactional writes.

Intel® Optane™ Persistent Memory (PMem) is an innovative memory technology that introduces a new category that sits between memory and storage, delivers a unique combination of affordable large capacity and support for data persistence. Users can build a larger persistent memory tier at affordable cost, providing flexible (volatile or non-volatile) and high-performance storage tier to various of workloads including but not limit to virtual machines in the cloud, in-memory databases, etc. Intel® Optane™ Persistent Memory has two different operating modes: Memory Mode and App Direct Mode.

We built a new solution made up of Alluxio and Intel Optane Persistent Memory and ran some performance benchmarks. Below you'll find the overview and results. You can get more details in the whitepaper we wrote.

Alluxio + Intel Optane Persistent Memory

The Alluxio and Intel joint PMem tier solution allows companies to unify on-premises and cloud data silos into a single, cloud-based data layer, increasing data accessibility and elasticity while virtually eliminating the need for copies—for less complexity, lower costs, and greater speed and agility. The new PMem tier accelerates data access and eliminates data bottlenecks.

As an open source orchestration layer that sits between disaggregated compute and storage, Alluxio can help to improve various workloads’ performance by bring data close to compute. Leveraging Alluxio intelligent multi-tier architecture, Intel® Optane™ PMem is a good fit to build a PMem based tier that further improve the performance and reduce the cost. Our works show Alluxio PMem tier demonstrated significant performance speedup over DRAM tier under same cost configuration: PMem tier in SoAD Mode delivers 2.13x speedup over without cache configuration, and 1.92x speedup over DRAM tie for 4TB Decision support workloads (parquet format), while PMem tier in Memory Mode delivers 1.24x speedup over without cache configuration, and 1.12x speedup over DRAM tier for 4TB Decision support workloads [DMH1] (parquet format).

Benchmark & Design

Alluxio PMEM Tier Design

Intel® Optane™ PMem supports legacy storage APIs in SoAD (Storage over App Direct) Mode and Memory Mode, it does not require code changes for those applications using legacy storage APIs, this makes PMem with SoAD Mode and Memory Mode a good fit for Alluxio intelligent multi-tier. Leveraging Alluxio current architecture, it is simple to enable Intel® Optane™ PMem in Alluxio by configuring it to Memory tier under Memory Mode and SSD tier under SoAD Mode. The larger capacity compared with DRAM makes Alluxio PMem tier in Memory Mode caches more data and benefits more workloads. The higher throughput compared with SSD makes Alluxio Pmem tier in SoAD Mode delivers higher throughput. This new PMem based tier creates a new high performance, low cost storage layer for Alluxio.

Alluxio PMEM Tier Performance

To evaluate the performance of the PMem tier, we conducted multiple tests with different configurations in a five-node environment, which consists of two compute nodes and three storage nodes. Each compute node was equipped with two Intel® Xeon® gold 6240 processors and configured with different DRAM and PMem with same cost for an ISO-cost comparison. For DRAM, 24x 32GB memory was used, for PMem, 8x 128GB PMem was used. For storage node, each was configured with 11x 1TB HDDs on as data storage. Ceph was used as disaggregated object storage system, and connected to the Hadoop environment with s3a connector. Hadoop 3.1.2 and Spark 2.3.0 were deployed on the compute nodes. The figure below shows the system topology.

Disaggregated storage and compute configuration

Testing Methodology

Decision support workload is a typical workload that models multiple aspects of a decision support system, including queries and data maintenance. We selected 54 queries that represent a typical SQL query behavior in Hadoop for the test. The tests include three different configurations: Without Alluxio, Alluxio on PMem and Alluxio on DRAM. The PMem case was tested with both SoAD Mode and Memory Mode. Multiple runs were tested, and the median results was chosen to mitigate results variations.

Alluxio PMem tier performance in SoAD Mode

The Alluxio PMem tier (SoAD Mode) delivers 2.13x speedup over without cache configuration, 1.92x speedup over DRAM tier in ISO-cost configuration for decision support workload on 4TB data set in parquet format.

You can see more benchmark information including configuration and results in our whitepaper. We hope you find it useful!

Share this post

Blog

How Coupang Leverages Distributed Cache to Accelerate Machine Learning Model Training

Coupang, a Fortune 200 technology company, manages a multi-cluster GPU architecture for their AI/ML model training. This architecture introduced significant challenges, including:

Time-consuming data preparation and data copy/movement
Difficulty utilizing GPU resources efficiently
High and growing storage costs
Excessive operational overhead maintaining storage for localized data silos

To resolve these challenges, Coupang’s AI platform team implemented a distributed caching system that automatically retrieves training data from their central data lake, improves data loading performance, unifies access paths for model developers, automates data lifecycle management, and extends easily across Kubernetes environments. The new distributed caching architecture has improved model training speed, reduced storage costs, increased GPU utilization across clusters, lowered operational overhead, enabled training workload portability, and delivered 40% better I/O performance compared to parallel file systems.

Uptycs Chooses Alluxio to Power GenAI Natural Language Analytics at Terabyte Scale

Suresh Kumar Veerapathiran and Anudeep Kumar, engineering leaders at Uptycs, recently shared their experience of evolving their data platform and analytics architecture to power analytics through a generative AI interface. In their post on Medium titled Cache Me If You Can: Building a Lightning-Fast Analytics Cache at Terabyte Scale, Veerapathiran and Kumar provide detailed insights into the challenges they faced (and how they solved them) scaling their analytics solution that collects and reports on terabytes of telemetry data per day as part of Uptycs Cloud-Native Application Protection Platform (CNAPP) solutions.

AI/ML Infra Meetup at Uber Seattle: Tackling Scalability Challenges of AI Platforms

Insights from from Uber, Snap, and Alluxio on LLM training, fine-tuning, deployment, designing scalable architectures, GPU optimization, and building recommendations systems.

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer

Request a demo

Alluxio Enterprise AI

Alluxio Enterprise Data

Alluxio + Intel Optane Persistent Memory

Benchmark & Design

Blog

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer