Presto on Alluxio: How Netease Games leveraged Alluxio to boost ad hoc SQL on HDFS

Netease Games is the operator for many popular online games in China like “World of Warcraft” and “Hearthstone”. Netease Games also has developed quite a few popular games on its own such as “Fantasy Westward Journey 2”, “Westward Journey 2”, “World 3”, “League of Immortals”. The strong growth of the business drives the demand to build and maintain a data platform handling a massive amount of data and delivering insights promptly from the data. Given our data scale, it is very challenging to support high-performance ad-hoc queries to the data with results generated in a timely manner.

Starburst Presto + Alluxio = Better Together for Presto Caching

Presto was designed from the ground up to offer interactive analytics using a massively parallel processing SQL engine that can combine data from multiple sources using a variety of connectors. As more and more companies discover the power of “separation of storage and compute” along with querying the data where it lies, it’s not wonder Presto is being asked to add even more functionality.
Alluxio focuses its innovation at the data layer as a key enabling technology for Presto and a wide range of analytics applications and use cases. Performance is always critical, but providing memory speed response time is only part of the solution. If the application can’t access the data, it’s of no use.

MOMO: Accelerating Ad Hoc Analysis with Spark SQL and Alluxio

Alluxio clusters act as a data access accelerator for remote data in connected storage systems. Temporarily storing data in memory, or other media near compute, accelerates access and provides local performance from remote storage. This capability is even more critical with the movement of compute applications to the cloud and data being located in object stores separate from compute. Caching is transparent to users, using read/write buffering to maintain continuity with persistent storage. Intelligent cache management utilizes configurable policies for efficient data placement and supports tiered storage for both memory and disk (SSD/HDD).

Whitepaper: MOMO – Accelerating Ad Hoc Analysis with Spark SQL and Alluxio

From our friends at MOMO The hadoop ecosystem makes many distributed system/algorithms easier to use and generally lowers the cost of operations. However, enterprises and vendors are never satisfied with that, so higher performance becomes the next issue. We considered several options to address our performance needs and focused our efforts on Alluxio, which improves performance … Continued

Tags: , , , , ,

Enabling Decoupled Compute and Storage with Alluxio

The primary appeal of a coupled compute-storage architecture, an architecture where the computation is happening on the machines where the data resides, is the performance possible by bringing the compute engine to the data it requires; however, the costs of maintaining such tight-knit architectures are gradually overtaking the performance benefits. Especially with the popularity of cloud resources, being able to independently scale compute and storage results in large cost savings and cheaper maintenance. In addition, data has become the new oil, and all modern organizations are looking to capture as much data as possible.