Get Started with the Alluxio – Presto Sandbox USING DOCKER
10 min Tutorial – See how Alluxio speeds up Presto queries – even when data is remote!
The Alluxio-Presto sandbox is a docker application that include the full analytics stack needed to run Presto queries. The sandbox lets you easily dive into an interactive environment where you can explore Alluxio, run queries with Presto, and see the performance benefits of using Alluxio. You’ll see how Alluxio can improve query performance even on data that’s remote achieving speeds of a local environment. We’ll use the TPC-DS dataset and one of the queries for data stored in an AWS S3 bucket.
- Download and launch the container
- Explore Alluxio
- Run Queries with Presto on Alluxio
- Next Steps
- Docker installed on your machine (MacOS or Linux)
- Minimum 6GB of RAM available on your local machine to run the container. 8GB is recommended
19999should be open and available.
If you have an instance of Alluxio running locally, stop it using
If you have a sandbox container running, stop it using
docker rm -f alluxio-presto-sandbox
DOWNLOAD AND LAUNCH THE CONTAINER
We’ll use a combination of the Alluxio web UI at http://localhost:19999 and the Alluxio CLI to explore the Alluxio filesystem and cluster status
The container comes with an Amazon S3 bucket pre-mounted in Alluxio at the
/scale1 directory. It contains data for TPC-DS benchmarks at the “scale 1” size factor which amounts to about 1GB of data across multiple tables.
Open the Alluxio web UI at http://localhost:19999 to check if the Alluxio master has started successfully. If not, wait a few moments, refresh the page, and it should become available.
RUN QUERIES WITH PRESTO ON ALLUXIO
In this next section we’re going to use Presto and Alluxio to show how Alluxio can massively decrease query times by reading cached data.
This guide focuses on using Presto through the command line; however, you can also use the Presto UI at http://localhost:8080 to view the status of your queries.