presto Archives

A Deep Dive into Caching in Presto

October 11, 2023 By Hope Wang and Beinan Wang

This article was initially posted on InfoWorld. Understand the caching mechanisms for the popular distributed SQL engine and how to use them to improve query speed and efficiency. Presto is a popular, open source, distributed SQL engine that enables organizations to run interactive analytic queries on multiple data sources at a large scale. Caching is a typical optimization … Continued

A Deep Dive into the Call Chain Relationship Between Presto, Hive, and Alluxio

September 11, 2023 By Jiaming Mai

Alluxio is commonly used with Presto and Hive to accelerate queries. Understanding how Presto+Hive+Alluxio work together and the flow from SQL query to low-level file system operations is key to tuning performance. This post will dive into the relationship between Presto, Hive, and Alluxio. We will walk you through how a SQL query executes in … Continued

The Power of Data Orchestration: Storage Acceleration and Servitization at Shopee

September 27, 2022

Shopee is the leading e-commerce platform in SouthEast Asia. In this presentation, Luo Li from Shopee will share their Data Infra team’s recent project on acceleration with Presto and storage servitization. He will share the details on how Shopee leverages Alluxio to accelerate Presto query and provide standardized methods of accessing data through Alluxio-Fuse and Alluxio-S3.

Tags: alluxio day, presto, shopee

ML-Based SQL Query Resource Usage Prediction

September 15, 2022

With the advent of the Big Data era, it is usually computationally expensive to calculate the resource usages of a SQL query. Can we estimate the resource usages of SQL queries more efficiently without any computation in a SQL engine kernel? In this session, Chunxu and Beinan would like to introduce how Twitter’s data platform leverages a machine learning-based approach in Presto and BigQuery to estimate query utilization with 90%+ accuracy.

Tags: alluxio day, big data, machine learning, presto, sql, twitter

Avoid Data Silos in Presto in Meta: the journey from Raptor to RaptorX

August 29, 2022 By Rongrong Zhong

This blog was originally published in the Presto blog: https://prestodb.io/blog/2022/01/28/avoid-data-silos-in-presto-in-meta Alluxio: Rongrong Zhong Meta: James Sun, Ke Wang Raptor is a Presto connector (presto-raptor) that is used to power some critical interactive query workloads in Meta (previously Facebook). Though referred to in the ICDE 2019 paper Presto: SQL on Everything, it remains somewhat mysterious to many Presto users … Continued

Designing the Presto Local Cache at Uber | A collaboration between Uber and Alluxio – part 2

May 31, 2022 By Chen Liang and Beinan Wang

In the previous blog, we introduced Uber’s Presto use cases and how we collaborated to implement Alluxio local cache to overcome different challenges in accelerating Presto queries. The second part discusses the improvements to the local cache metadata.

Speed Up Uber’s Presto with Alluxio | A collaboration between Uber and Alluxio – part 1

May 24, 2022 By Chen Liang and Beinan Wang

This article shares how Uber and Alluxio collaborated to design and implement Presto local cache to reduce HDFS latency.

The power of data orchestration: Storage Acceleration and Servitization at Shopee

April 28, 2022

Shopee is the leading e-commerce platform in SouthEast Asia. In this presentation, Tianbao Ding and Haoning Sun from Shopee will share their Data Infra team’s recent project on acceleration with Presto and storage servitization. They will share the details on how Shopee leverages Alluxio to accelerate Presto query and provide standardized method of accessing data through Alluxio-Fuse and Alluxio-S3.

Tags: alluxio day, fuse, presto, s3, shopee

Improving Presto Architectural Decisions with Alluxio Shadow Cache at Meta (Facebook)

March 30, 2022 By Ke Wang and Zhenyu Song

With the collaboration between Meta (Facebook), Princeton University, and Alluxio, we have developed “Shadow Cache” – a lightweight Alluxio component to track the working set size and infinite cache hit ratio. Shadow cache can keep track of the working set size over the past window dynamically and is implemented by a series of bloom filters. Shadow cache is deployed in Meta (Facebook) Presto and is being leveraged to understand the system bottleneck and help with routing design decisions.

Tag: presto