TUTORIAL: GETTING STARTED WITH Starburst Presto & Alluxio on AWS using Cloud Formation Template
5 min TutoriaL
This tutorial walks you through the steps to create a Starburst Presto and Alluxio cluster using a combined AWS Cloud Formation Template CloudFormation template. The CFT lets you easily dive into an interactive environment where you can explore Alluxio, run queries with Presto, and experience the performance benefits of using Alluxio in a big data software stack. It will also show how Alluxio can improve Presto’s query performance by reading through Alluxio to access locally cached data, that is stored in an Amazon S3 data lake.
- Deploying Alluxio and Presto joint cluster
- Access the Alluxio Presto cluster
- Explore Alluxio
- Run queries with Presto on Alluxio
- Presto on Alluxio advantages
Familiarity with EC2 and CloudFormation is helpful. The tutorial deploys a Presto and Alluxio joint cluster using CloudFormation Template through the web console step by step.
Note that the launched EC2 instances inside the cluster does not qualify for free usage tier because the instances need to have sufficient resources to execute the workload. The instance type we will be using is r4.2xlarge.
Deploying A Presto & Alluxio CLUSTER using CFT
The detailed explanation of selecting template and specifying cluster details is documented in the Presto with caching CFT deployment docs.
You can also find the CloudFormation template on the AWS Marketplace.
Follow the deployment instructions to set up your cluster.
For this tutorial, it’s suggested to use the following parameters to create a cluster with enough resources to run the example queries:
- SecurityGroup: have a security group with the following ports opened
- Port 22 for SSH
- Port 19999 and 30000 for accessing the Alluxio web UI
- Port 8080 and 8088 for accessing Presto web UI
- CoordinatorInstanceType: r4.2xlarge
- WorkersInstanceType: r4.2xlarge
- HACoordinatorsCount: 1
Access the Presto ALLUXIO cluster
In order to access the Presto Alluxio cluster, you must obtain the address of the Presto coordinator. Alluxio CLIs can be run in any of the cluster nodes, but Presto CLI can only be run on the active Presto coordinator node.
We’ll use a combination of Alluxio web UI and the Alluxio CLI to explore the Alluxio cluster status.
Run queries with Presto on Alluxio
In this section we’re going to use Presto and Alluxio to show how Alluxio can massively decrease query times by reading cached data.
This guide focuses on using presto-cli; however, you can also use the Presto UI at the http://EC2_PUBLIC_DNS:8080 to view the status of your queries.
Presto on Alluxio advantages
Running Presto on Alluxio has never been easier, using features newly added in Alluxio 2.1.0. Most importantly, the amount of configuration is reduced; the above workflow was possible out of the box with only a few configuration entries in the existing catalog/hive.properties.
Alluxio’s seamless integration with transparent Hive integration makes it very easy to use Alluxio as a caching layer for Presto. Presto queries can run significantly faster with Alluxio caching frequently data locally to provide memory speed data analytics for Presto.