Resource Hub

Presentation

Presentation

Hybrid Collaborative Tiered Storage with Alluxio

When an application reads data from AWS S3 or Alibaba Cloud OSS, it usually has serious performance problems, after all, it is through a remote network. Alluxio can provide a transparent data cache layer, automatic cache needs to read remote OSS/S3 data, but when does Alluxio itself pull remote data? Default all cache? Still on-demand caching? This PPT will introduce Alluxio’s hierarchical storage concept, combined with the ZFS system to maximize performance and reduce application development.

See results of 10x performance in Spark and Hive jobs that are running on AWS S3. Plus, learn how real world user Bazaarvoice implemented a tiered storage architecture for a boost in performance, enabling them to handle data at massive Internet-scale to serve its customers.

Presentation

Presentation

Alluxio Overview: Unify Data at Memory Speed

Alluxio is an open source software solution that connects analytics applications to heterogeneous data sources through a data orchestration layer that sits between compute and storage. It runs on commodity hardware, creating a shared data layer abstracting the files or objects in underlying persistent storage systems. Applications connect to Alluxio via a standard interface, accessing data from a single unified source.

Haoyuan Li and Bin Fan discuss the data center challenges Alluxio addresses, the benefits provided, and an overview of how it works.

Blog

Blog

A Better Big Data Ecosystem with Hadoop and Hitachi Content Platform Part1

This blog explores the challenges customers are facing with storing data long term in Hadoop, and discusses what the Hitachi Content Platform team is doing to help our customers solve these challenges with the help of Alluxio. Data is at the center of our digital world and for years Hadoop has been the go-to data processing platform because it is fast and scalable. While Hadoop has solved the data storage and processing problem for the last ~10 years, it achieves this by scaling storage and compute capacity in parallel. As a result, Hadoop environments have continued to expand compute capacity well beyond their needs as more and more of the storage is consumed by older, inactive data.

Presentation

Presentation

Alluxio in MOMO, JD.com, TalkingData, and Vipshop [Chinese]

Alluxio in MOMO: Accelerating Ad Hoc Analysis

From our friends at MOMO

MOMO, a leading pan-entertainment social platform in China, has deployed Alluxio to accelerate ad-hoc query analytics. In the course of evaluating the best fit for Alluxio in their infrastructure they conducted several performance tests to understand how ad-hoc query analytics behaved in several scenarios. These tests give real-world insight to the performance benefits Alluxio provides. The MOMO findings include:

With Alluxio, performance was improved 3-5x over the current mode
Even when initially reading ‘cold’ data Alluxio delivered superior performance in most cases
Alluxio can effectively scale-out to improve performance as requirements grow

Blog

Blog

Effective caching for Spark RDDs with Alluxio

Recently, Qunar deployed Alluxio with Spark in production and found that Alluxio enables Spark streaming jobs to run 15x to 300x faster. In their case study, they described how Alluxio improved their system architecture, and mentioned that some existing Spark jobs would slow down or would never finish because they would run out of memory. After using Alluxio, those jobs were able to finish, because the data could be stored in Alluxio, instead of within Spark. In this blog, we show by saving RDDs in Alluxio, Alluxio can keep larger data sets in-memory for faster Spark applications, as well as enable sharing of RDDs across separate Spark applications.

Blog

Blog

Starburst Presto Alluxio Better Together for Presto Caching

Presto was designed from the ground up to offer interactive analytics using a massively parallel processing SQL engine that can combine data from multiple sources using a variety of connectors. As more and more companies discover the power of “separation of storage and compute” along with querying the data where it lies, it’s not wonder Presto is being asked to add even more functionality. Alluxio focuses its innovation at the data layer as a key enabling technology for Presto and a wide range of analytics applications and use cases. Performance is always critical, but providing memory speed response time is only part of the solution. If the application can’t access the data, it’s of no use.

Blog

Blog

Announcing Alluxio v1.8.0

We are excited to announce the release of Alluxio Enterprise Edition (AEE) and Community Edition (ACE) and Alluxio Open Source (AOS) v1.8.0. Click HERE to download! This release brings features and enhancements in Alluxio to simplify cloud adoption (and hybrid cloud, and migration from HDFS to object storage) for analytics and machine learning and improve useability. To help make it easier to get started using Alluxio, we have also collected a set of resources into a starter kit. The second item is a simple tutorial for how to mount a remote AWS S3 bucket and accelerate data access.

Blog

Blog

Data Location Awareness Optimize Performance and Lower Cost with Tiered Locality

Caching frequently used data in memory is not a new computing technique, however it is a concept that Alluxio has taken to the next level with the ability to aggregate data from multiple storage systems in a unified pool of memory. Alluxio capabilities extend further to intelligently managing the data within that virtual data layer. Tiered locality uses awareness of network topology and configurable policies to manage data placement for performance and cost optimizations. This feature is particularly useful with cloud deployments across multiple availability zones. It can also be useful for cost savings in environments where cross-zone or cross-location traffic is more expensive than intra-zone data traffic.

Blog

Blog

Asynchronous Caching in Alluxio High Performance for Partial Read Caching for Presto and Spark

An Alluxio cluster caches data from connected storage systems in memory to create a data layer that can be accessed concurrently by multiple application frameworks. This greatly improves performance for many analytics workloads. On-demand caching occurs when clients read blocks of data using a ‘CACHE’ read type from persistent storage systems connected to the Alluxio cluster. Prior to Alluxio v1.7, on-demand caching was on the critical path of read operations, requiring a full block to be read before the data was available for the application. Workloads which read partial blocks, for example SQL workloads, would be adversely affected on initial reads from connected storage.

White Paper

White Paper

A Case For Packing And Indexing In Cloud File Systems

Case Study

Case Study

TalkingData

Leading Data Broker in China Leverages Alluxio to Unify Terabytes of Data Across Disparate Data Sources

Blog

Blog

TalkingData Leading Data Broker in China Leverages Alluxio to Unify Terabytes of Data Across Disparate Data Sources

TalkingData leverages Alluxio as a single platform to manage all the data across disparate data sources on-premise and in the cloud. Alluxio removes the complexity of our environment by abstracting the different data sources and providing a unified interface. Applications simply interact with Alluxio, and Alluxio manages data access to different storage systems on behalf of the applications. Alluxio effectively democratizes data access, allowing data scientists and analysts in various business units to accomplish their goals without needing to consider where the data is located or having to go to central IT or the engineering team to transfer or prepare the data.

Blog

Blog

Myntra Case Study Accelerating Analytics in the Cloud for Customized Mobile ECommerce

While looking for ways to streamline our data pipeline, we learned about Alluxio, an open source, memory speed, virtual distributed file system. We deployed Alluxio as the shared data layer for all of the intermediate stages in the data pipeline. By reading and writing data in Alluxio, the data can be read concurrently and stay in memory for the next stage of the pipeline. This increased the performance by speeding up the entire pipeline, and increased overall throughput of the pipeline allowing us to provide interactive response to our app users.

Case Study

Case Study

Myntra

Accelerating Analytics in the Cloud for Mobile E-Commerce

Presentation

Presentation

Using Alluxio as a Fault-Tolerant Pluggable Optimization Component to Compute Frameworks of JD System

STRATA DATA CONFERENCE LONDON 2018

JD.com is China’s largest online retailer and its biggest overall retailer, as well as the country’s biggest internet company by revenue. Currently, JD.com’s BDP platform runs more than 400,000 jobs (15+ PB) daily, on a system with more than 15,000 cluster nodes and a total capacity of 210 PB.

Alluxio, formerly Tachyon, is the world’s first system that unifies disparate storage systems at memory speed. In the big data ecosystem, Alluxio lies between computation frameworks or jobs and various kinds of storage systems. Additionally, Alluxio’s memory-centric architecture enables data access orders of magnitude faster than existing solutions.

Alluxio has run in JD.com’s production environment on 100 nodes for six months. Mao Baolong, Yiran Wu, and Yupeng Fu explain how JD.com uses Alluxio to provide support for ad hoc and real-time stream computing, using Alluxio-compatible HDFSURLs and Alluxio as a pluggable optimization component. To give just one example, one framework, JDPresto, has seen a 10x performance improvement on average. This work has also extended Alluxio and enhanced the syncing between Alluxio and HDFS for consistency.

Case Study

Case Study

Tencent

Delivering Customized News to Over 100 Million Monthly Users

Blog

Blog

Tencent Case Study Delivering Customized News to Over 100 Million Users per Month with Alluxio

Tencent is one of the largest technology companies in the world and a leader in multiple sectors such as social networking, gaming, e-commerce, mobile and web portal. Tencent News, one of Tencent’s many offerings, strives to create a rich, timely news application to provide users with an efficient, high-quality reading experience. To provide the best experience to more than 100 million monthly active users of Tencent News, we leverage Alluxio with Apache Spark to create a scalable, robust, and performant architecture.

Blog

Blog

MOMO Accelerating Ad Hoc Analysis with Spark SQL and Alluxio

Alluxio clusters act as a data access accelerator for remote data in connected storage systems. Temporarily storing data in memory, or other media near compute, accelerates access and provides local performance from remote storage. This capability is even more critical with the movement of compute applications to the cloud and data being located in object stores separate from compute. Caching is transparent to users, using read/write buffering to maintain continuity with persistent storage. Intelligent cache management utilizes configurable policies for efficient data placement and supports tiered storage for both memory and disk (SSD/HDD).

White Paper

White Paper

Whitepaper: MOMO – Accelerating Ad Hoc Analysis with Spark SQL and Alluxio

Case Study

Case Study

Hedge Fund

Hedge Fund Improves Machine Learning Model Performance 4X with Alluxio

Blog

Blog

Lenovo Case Study Analytics on Data from Multiple Locations and Eliminating ETL

Lenovo is an Alluxio customer with a common problem and use case in the world of data analytics. They have petabytes of data in multiple data centers in different geographic locations. Analyzing it requires an ETL process to get all of the data in the right place. This is both slow, because data has to be transferred across the network, and costly because multiple copies of the data need to be stored. Freshness and quality of the data can also suffer as the data is also potentially out of date and incomplete because regulatory issues prevent certain data from being transferred.

Case Study

Case Study

Lenovo

Lenovo Analyzes Petabytes of Smartphone Data from Multiple Locations and Eliminates ETL with Alluxio

Blog

Blog

New Whitepaper Structured Big Data Federation

Alluxio helps organizations handle their big data by providing a unified view of all of the data in your enterprise – on premise, in the cloud, or hybrid. Applications access data using a standard interface to a global virtual namespace. Alluxio also employs a memory-centric architecture to enable data access at memory speed. With the combined unification and performance benefits, Alluxio can effectively provide big data federation for organizations by acting as a virtual data lake.

White Paper

White Paper

Structured Big Data Federation Using Alluxio

Your selections don't match any items.

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer

Request a demo

Alluxio Enterprise AI

Alluxio Enterprise Data

Resource Hub

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer