Speed Up Uber’s Presto with Alluxio | A collaboration between Uber and Alluxio – part 1
This article shares how Uber and Alluxio collaborated to design and implement Presto local cache to reduce HDFS latency.
This article shares how Uber and Alluxio collaborated to design and implement Presto local cache to reduce HDFS latency.
Shopee is the leading e-commerce platform in SouthEast Asia. In this presentation, Tianbao Ding and Haoning Sun from Shopee will share their Data Infra team’s recent project on acceleration with Presto and storage servitization. They will share the details on how Shopee leverages Alluxio to accelerate Presto query and provide standardized method of accessing data through Alluxio-Fuse and Alluxio-S3.
Tags: alluxio day, fuse, presto, s3, shopee
With the collaboration between Meta (Facebook), Princeton University, and Alluxio, we have developed “Shadow Cache” – a lightweight Alluxio component to track the working set size and infinite cache hit ratio. Shadow cache can keep track of the working set size over the past window dynamically and is implemented by a series of bloom filters. Shadow cache is deployed in Meta (Facebook) Presto and is being leveraged to understand the system bottleneck and help with routing design decisions.
This talk covers how Uber’s Presto team implements the cache invalidation and dashboard for Alluxio’s Local Cache. Liang Chen will also share his experience using a customized cache filter to resolve the performance degradation due to a large working set.
Tags: alluxio day, local cache, performance, presto, uber
This article highlights synergy between the two widely adopted open-source projects, Alluxio and Presto, and demonstrates how together they deliver a self-serve data architecture across clouds.
Running Presto with Alluxio is gaining popularity in the community. It avoids long latency reading data from remote storage by utilizing SSD or memory to cache hot dataset close to Presto workers. Presto supports hash-based soft affinity scheduling to enforce that only one or two copies of the same data are cached in the entire cluster, which improves cache efficiency by allowing more hot data cached locally. The current hashing algorithm used, however, does not work well when cluster size changes. This article introduces a new hashing algorithm for soft affinity scheduling, consistent hashing, to address this problem.
Many companies have leveraged Alluxio to level up their current Presto platform, including Facebook, TikTok, Electronic Arts, Walmart, Tencent, Comcast, and more. They have gained significant benefits with Alluxio integrated into their Presto stack.
Tags: architecture, cloud, data orchestration, interactive queries, presto, spark, storage
Alluxio is the data orchestration platform to unify data silos across heterogeneous environments. The following blog will discuss the architecture combining Spark with Alluxio.
This talk describes the design of shadow cache, a lightweight component to track the working set size of Alluxio cache. Shadow cache can keep track of the working set size over the past window dynamically, and is implemented by a series of bloom filters. We’ve deployed the shadow cache in Facebook Presto and leverage the result to understand the system bottleneck and help with routing design decisions.
Tags: alluxio day, architecture, cache, facebook, presto, shadow cache