This datasheet introduces the Presto + Alluxio Solution. Alluxio enables caching for Presto as well as hybrid deployments.
Learn how to set up Presto with Alluxio such that Presto jobs can seamlessly read from and write to S3.
Compare the performance between Presto on S3 with Presto and Alluxio on S3.
Announcing the OEM partnership with Alluxio and Starburst Data, the company behind Presto, the fastest growing SQL query engine in a disaggregated world.
Some people experience serious performance issue in HDFS namenode (v2.7) response time. Particularly during peak traffic time, an HDFS namenode can become overloaded and some DFS operations (like listing a directory) can take a long time, which affects the query response time for Presto and other Hadoop applications. To solve for challenges in high latency … Continued
Problem If you have hundreds of external tables defined in Hive, what is the easist way to change those references to point to new locations? That is a fairly normal challenge for those that want to integrate Alluxio into their stack. A typical setup that we will see is that users will have Spark-SQL or … Continued
How do we access AWS S3 data when running Presto in an on-premise environment, how can we do it efficiently to reduce both egress cost and performance runtimes? Alluxio as a local cache for Presto queries against remote AWS S3 data sources As we move toward more and more decoupled environments one of the things … Continued
Many organizations are leveraging EMR to run big data analytics on public cloud. However, reading and writing data to S3 directly can result in slow and inconsistent performance. Alluxio is a data orchestration layer for the cloud, and in this use case it caches data for S3, ensuring high and predictable performance as well as reduced network traffic.
In this talk, we will focus on Alluxio design, its architecture, data flow and metadata flow. We will dive into the choices in its design space and share the experiences when implementing features like data tiering, storage options and cache eviction policies. We will also share our lessons in design, implementation and operation when working to build an open source distributed storage systems with 900 contributors for 5+ years.
Wenbo Zhao (Two Sigma) and Bin Fan (Alluxio) will be presenting on how Two Sigma uses Alluxio to make data-intensive compute independent of the storage beneath.