China Unicom Uses Alluxio and Spark to Build New Computing Platform to Serve Mobile Users

Abstract China Unicom is one of the five largest telecom operators in the world. China Unicom’s booming business in 4G and 5G networks has to serve an exploding base of hundreds of millions of smartphone users. This unprecedented growth brought enormous challenges and new requirements to the data processing infrastructure. The previous generation of its … Continued

Achieving 10x acceleration of Spark and Hive Jobs on AWS S3 with Alluxio Tiered Storage

The data engineering team at Bazaarvoice, a software-as-a-service digital marketing company based in Austin, Texas, must handle data at massive Internet-scale to serve its customers. Facing challenges with scaling their storage capacity up and provisioning hardware, they turned to Alluxio’s tiered storage system and saw 10x acceleration of their Spark and Hive jobs running on AWS S3.

In this whitepaper you’ll learn:

  • How to build a big data analytics platform on AWS that includes technologies like Hive, Spark, Kafka, Storm, Cassandra, and more
  • How to setup a Hive metastore using a storage tier for hot tables
  • How to leverage tiered storage for maximized read performance

Tags: , , , , , ,

Accelerate Spark and Hive Jobs on AWS S3 by 10x with Alluxio Tiered Storage

In this article, Thai Bui from Bazaarvoice describes how Bazaarvoice leverages Alluxio to build a tiered storage architecture with AWS S3 to maximize performance and minimize operating costs on running Big Data analytics on AWS EC2. This blog is an abbreviated version of the full-length technical whitepaper (coming soon) which aims to provide the following takeaways: Common … Continued

One Click to Benchmark Spark + Alluxio + S3 Stack with TPC-DS queries on AWS

The Alluxio sandbox is the easiest way to test drive the popular data analytics stack of Spark, Alluxio, and S3 deployed in a multi-node cluster in a public cloud environment. The sandbox cluster is fully configured and ready for users to run applications ranging from the hello-world example to the TPC-DS benchmark suite. Don’t take our word … Continued

Presto on Alluxio: How Netease Games leveraged Alluxio to boost ad hoc SQL on HDFS

Author: Shuang Li (Shuang is a big data engineer at Netease Games, developing and maintaining OLAP related solutions in the data warehouse. He works closely on Apache Kylin and Presto as well as HBase. Shuang graduated from South China University of Technology.) Background As one of the world’s leading online game company, Netease Games is … Continued

Using Alluxio as a Fault-tolerant Pluggable Optimization Component of JD.com’s Computation Frameworks

JD.com is China’s largest online retailer and its biggest overall retailer, as well as the country’s biggest internet company by revenue. Currently, JD.com’s BDP platform runs more than 400,000 jobs (15+ PB) daily, on a system with more than 15,000 cluster nodes and a total capacity of 210 PB.

Alluxio has run in JD.com’s production environment on 100 nodes for six months. See how JD.com uses Alluxio to provide support for ad hoc and real-time stream computing, using Alluxio-compatible HDFS URLs and Alluxio as a pluggable optimization component.

Tags: , , , , ,