case study Archives | Page 2 of 3

Achieving 10x acceleration of Spark and Hive Jobs on AWS S3 with Alluxio Tiered Storage

February 20, 2019

The data engineering team at Bazaarvoice, a software-as-a-service digital marketing company based in Austin, Texas, must handle data at massive Internet-scale to serve its customers. Facing challenges with scaling their storage capacity up and provisioning hardware, they turned to Alluxio’s tiered storage system and saw 10x acceleration of their Spark and Hive jobs running on AWS S3.

In this whitepaper you’ll learn:

How to build a big data analytics platform on AWS that includes technologies like Hive, Spark, Kafka, Storm, Cassandra, and more
How to setup a Hive metastore using a storage tier for hot tables
How to leverage tiered storage for maximized read performance

Tags: apache hive, apache spark, aws s3, benchmark, case study, performance, tiered storage

Accelerate Spark and Hive Jobs on AWS S3 by 10x with Alluxio as a Tiered Storage Solution

February 20, 2019 By Thai Bui

In this article, Thai Bui from Bazaarvoice describes how Bazaarvoice leverages Alluxio to build a tiered storage architecture with AWS S3 to maximize performance and minimize operating costs on running Big Data analytics on AWS EC2.

Presto on Alluxio: How Netease Games leveraged Alluxio to boost ad hoc SQL on HDFS

January 11, 2019 By Shuang Li

Netease Games is the operator for many popular online games in China like “World of Warcraft” and “Hearthstone”. Netease Games also has developed quite a few popular games on its own such as “Fantasy Westward Journey 2”, “Westward Journey 2”, “World 3”, “League of Immortals”. The strong growth of the business drives the demand to build and maintain a data platform handling a massive amount of data and delivering insights promptly from the data. Given our data scale, it is very challenging to support high-performance ad-hoc queries to the data with results generated in a timely manner.

AVA – Qiniu AI Lab, CTrip, and Sogou Use Cases [Chinese]

October 1, 2018

Learn more about the practice of Alluxio in AVA deep learning platform, Ctrip big data platform, and Sogou.

Tags: big data, case study, hive, machine learning, spark

Using Alluxio as a Fault-tolerant Pluggable Optimization Component of JD.com’s Computation Frameworks

September 14, 2018 by Bing Bai & Tao Huang [JD.com]

Strata NY 2018 – Learn how to use Alluxio as a pluggable optimization component. Understand how JD.com uses Alluxio to provide support for ad hoc and real-time stream computing while ensuring consistency between Alluxio and HDFS.

Tags: apache hadoop, benchmark, case study, compute storage separation, hdfs, presto

Hybrid Collaborative Tiered Storage with Alluxio

September 14, 2018 by Thai Bui [Bazaarvoice]

See results of 10x performance in Spark and Hive jobs that are running on AWS S3. Plus, learn how real world user Bazaarvoice implemented a tiered storage architecture for a boost in performance.

Tags: apache spark, aws s3, case study, meetup, tiered storage

TalkingData Case Study: Leading Data Broker in China Leverages Alluxio to Unify Terabytes of Data Across Disparate Data Sources

June 26, 2018

TalkingData’s largest data broker, provides data intelligence solutions and processes over 20 terabytes of data and more than one billion session requests per day. TalkingData deployed Alluxio to unify disparate cloud, on-premise, and hybrid data sources for a range of analytics applications. The architecture provides self-service data access for data scientists and engineers, eliminating the need for ETL or manual IT assistance.

Tags: analytics, architecture, case study, compute storage separation, hybrid cloud

TalkingData: Leading Data Broker in China Leverages Alluxio to Unify Terabytes of Data Across Disparate Data Sources

June 25, 2018 By Zhitao Yan (TalkingData)

TalkingData leverages Alluxio as a single platform to manage all the data across disparate data sources on-premise and in the cloud. Alluxio removes the complexity of our environment by abstracting the different data sources and providing a unified interface. Applications simply interact with Alluxio, and Alluxio manages data access to different storage systems on behalf of the applications. Alluxio effectively democratizes data access, allowing data scientists and analysts in various business units to accomplish their goals without needing to consider where the data is located or having to go to central IT or the engineering team to transfer or prepare the data.

Tag: case study