Today when we create a Hive table, it is a common technique to partition the table across different values and ranges to improve query performance and reduce maintenance cost. However, Hive can not access a single table directly using a single query with the data of this Hive table across different mediums of storage and … Continued
This presentation focuses on how Alluxio enables the big data analytics stack to be cloud-native. Today’s cloud object storage systems provide more cost-effective and scalable storage solutions but also different semantics and performance implications compared to HDFS. Applications like Spark or Presto will not benefit from the node-level locality or cross-job caching when retrieving data from the cloud object storage. Deploying Alluxio to access cloud solves these problems because data will be retrieved and cached in Alluxio instead of the underlying cloud or object storage repeatedly.
Author: Shuang Li (Shuang is a big data engineer at Netease Games, developing and maintaining OLAP related solutions in the data warehouse. He works closely on Apache Kylin and Presto as well as HBase. Shuang graduated from South China University of Technology.) Background As one of the world’s leading online game company, Netease Games is … Continued
Today’s enterprises are decoupling storage and compute as they migrate to the cloud, and that’s where Alluxio comes in. Alluxio is the data orchestration layer between storage and compute, bringing your data closer to your Presto workloads for better performance on top of S3.
See how Presto + Alluxio gives you the performance needed for your compute, regardless of where it is – in the cloud or on-premise.
The following is a guest post from our friends at Starburst Data. With more companies using Presto for reporting and analytics, we here at Starburst are seeing more use cases around operational reporting. These types of queries need to be returned subsecond and usually involve a small subset of the dataset. Presto was designed from the … Continued
This post is guest authored by our friends at MOMO: Haojun (Reid) Chan and Wenchun Xu Data Analysis Trends The hadoop ecosystem makes many distributed system/algorithms easier to use and generally lowers the cost of operations. However, enterprises and vendors are never satisfied with that, so higher performance becomes the next issue. We considered several options … Continued
From our friends at MOMO The hadoop ecosystem makes many distributed system/algorithms easier to use and generally lowers the cost of operations. However, enterprises and vendors are never satisfied with that, so higher performance becomes the next issue. We considered several options to address our performance needs and focused our efforts on Alluxio, which improves performance … Continued
Enabling Decoupled Compute and Storage with Alluxio This blog explores the benefits Alluxio brings to data platforms, including: The trends behind the rise of decoupled compute-storage architectures How Alluxio addresses data access issues for decoupled compute-storage architectures An example of Alluxio’s benefits using a SparkSQL workload Motivation The primary appeal of a coupled compute-storage architecture, … Continued