The Alluxio-Presto sandbox is a docker application featuring installations of MySQL, Hadoop, Hive, Presto, and Alluxio. The sandbox lets you easily dive into an interactive environment where you can explore Alluxio, run queries with Presto, and see the performance benefits of using Alluxio in a big data software stack.
Presto and Spark are CPU-bound so they require CPU intensive instances. But on the other hand, they also need memory so the R4/R5 instances are what most users end up using for their Presto/Spark workloads. The memory itself will get distributed across Presto/Spark and Alluxio, and typically we see about 60% going to compute, 30% … Continued
Alluxio is a proud sponsor and exhibitor at the Presto Summit in San Francisco.
What’s Presto Summit? It’s the leading Presto conference co-organized by our partner Starburst Data and the Presto Software Foundation.
This whitepaper details how to leverage any public cloud (AWS, Google Cloud Platform, or Microsoft Azure) to scale analytics workloads directly on on-prem data without copying and synchronizing the data into the cloud. We will show an example of what it might look like to run on-demand Starburst Presto, Spark, and Hive with Alluxio in the public cloud using on-prem HDFS.
The paper also includes a real world case study on Two Sigma, a leading hedge fund based in New York City, who deployed large clusters of Google Compute Engine VMs with Spark and Alluxio using on-prem HDFS as the underlying storage tier.
This datasheet introduces the Presto + Alluxio Solution. Alluxio enables caching for Presto as well as hybrid deployments.
Learn how to set up Presto with Alluxio such that Presto jobs can seamlessly read from and write to S3.
Compare the performance between Presto on S3 with Presto and Alluxio on S3.
Announcing the OEM partnership with Alluxio and Starburst Data, the company behind Presto, the fastest growing SQL query engine in a disaggregated world.
Some people experience serious performance issue in HDFS namenode (v2.7) response time. Particularly during peak traffic time, an HDFS namenode can become overloaded and some DFS operations (like listing a directory) can take a long time, which affects the query response time for Presto and other Hadoop applications. To solve for challenges in high latency … Continued
Problem If you have hundreds of external tables defined in Hive, what is the easist way to change those references to point to new locations? That is a fairly normal challenge for those that want to integrate Alluxio into their stack. A typical setup that we will see is that users will have Spark-SQL or … Continued