This article describes how JD built this interactive OLAP platform combining two open-source technologies: Presto and Alluxio.
Bay Area Meetup which include presentations on the architecture of Presto, its separation of compute and storage, cloud-readiness, recent advancements in the project such as Cost-Based Optimizer and Kubernetes Support. Presto and Alluxio production use cases and more.
Kubernetes is widely used across enterprises to orchestrate computation. And while Kubernetes helps improve flexibility and portability for computation in public/hybrid cloud environments across infrastructure providers, running data-intensive workloads can be challenging.
When it comes to efficiently moving data closer to Spark or Presto frameworks, co-locating data with these frameworks and accessing data from multiple or remote clouds is hard to do. That’s where Alluxio, an open source data orchestration platform, can help.
Alluxio enables data locality with your Spark and Presto workloads for faster performance and better data accessibility in Kubernetes. It also provides portability across storage providers.
In this on demand tech talk we’ll give a quick overview of Alluxio and the use cases it powers for Spark/Presto in Kubernetes. We’ll show you how to set up Alluxio and Spark/Presto to run in Kubernetes as well.
Many organizations have taken advantage of the scalability and cost-savings of cloud computing as well as cloud storage services to meet their data-powered workload demands. In addition, as data is increasingly siloed and lives everywhere, there’s a need for data orchestration to bring the needed data closer to compute. With Alluxio’s data orchestration platform, bring back data locality for your compute with in-memory & tiered data access.
This event features leading financial services company ING Bank’s user story on how they leverage open source technologies like Presto and Alluxio with S3.
The Alluxio-Presto sandbox is a docker application featuring installations of MySQL, Hadoop, Hive, Presto, and Alluxio. The sandbox lets you easily dive into an interactive environment where you can explore Alluxio, run queries with Presto, and see the performance benefits of using Alluxio in a big data software stack.
Presto and Spark are CPU-bound so they require CPU intensive instances. But on the other hand, they also need memory so the R4/R5 instances are what most users end up using for their Presto/Spark workloads. The memory itself will get distributed across Presto/Spark and Alluxio, and typically we see about 60% going to compute, 30% … Continued
Alluxio is a proud sponsor and exhibitor at the Presto Summit in San Francisco.
What’s Presto Summit? It’s the leading Presto conference co-organized by our partner Starburst Data and the Presto Software Foundation.
This whitepaper details how to leverage any public cloud (AWS, Google Cloud Platform, or Microsoft Azure) to scale analytics workloads directly on on-prem data without copying and synchronizing the data into the cloud. We will show an example of what it might look like to run on-demand Starburst Presto, Spark, and Hive with Alluxio in the public cloud using on-prem HDFS.
The paper also includes a real world case study on a leading hedge fund based in New York City, who deployed large clusters of Google Compute Engine VMs with Spark and Alluxio using on-prem HDFS as the underlying storage tier.