TUTORIAL: Getting Started with Alluxio &
AWS CloudFormation


5 MIN TUTORIAL

Among the many ways to deploy Alluxio on AWS, one of the simplest approaches is to use AWS CloudFormation. With a few clicks, all the resources needed for an Alluxio cluster will be automatically modeled and provisioned. You can define the type of Alluxio cluster you want and how you want to configure it. This tutorial outlines the steps to use the Alluxio CloudFormation template to provision a cluster, including setting up a cluster in high availability mode.

PREREQUISITES

Familiarity with Cloudformation and EC2 is helpful but not required. The tutorial launches an Alluxio cluster from the perspective of a user with a newly created AWS account.

Note that the launched instances do not qualify for free usage tier. The default instance type we will be using, r4.xlarge, costs only 7 cents an hour. 

LAUNCH AN ALLUXIO CLUSTER

This section outlines how to launch an Alluxio cluster with a single master node.

Step 0: Create an EC2 Key Pair

Note: If you have valid EC2 KeyPair already, skip ahead to step 1. 

CloudFormation will analyze existing AWS resources on your account to fill in values needed by the Alluxio CloudFormation template (CFT). For example, you need to choose an existing EC2 key pair that is in the region in which you are creating the Alluxio cluster stack. 

Open a web browser and navigate to the EC2 console, logging in with your credentials when prompted. Note the AWS region in the upper right corner.

Click on the region to open a dropdown of available regions. It is recommended to select the region that is geographically closest to your computer.

On the left sidebar, click the Key Pairs link to arrive at the following page:

Click the Create Key Pair button to create a new key pair and download the it to your computer. 

Assuming the downloaded key pair is located at ~/Downloads/keypair.pem, change the file permissions of the key pair to read-only by running the following command:

$ chmod 400 ~/Downloads/keypair.pem

Step 1:  Choose the CloudFormation Template to Launch

Navigate to CloudFormation console and click Create stack

Copy the Amazon S3 URL of Alluxio CloudFormation Template https://alluxio-public.s3.amazonaws.com/cft/AlluxioCFT.json to the Amazon S3 URL field.


Step 2: Specify Stack Details

Specifying the details of your Alluxio cluster in this page. 

  • Stack details:  Stack name of your Alluxio cluster
  • Network Configuration: Choose the VPC and Subnet from the existing values. It’s assumed that those values are preconfigured in your account. If not, create one following the AWS vpc docs. These parameters isolated the Alluxio cluster from other virtual private clouds and availability zones. Choose at least one security group as a virtual firewall that controls the traffic allowed to reach your Alluxio cluster. It’s recommended that ports 19999 and 30000 are accessible in order to access the Alluxio Web UI. Additionally, it’s recommended that port 22 is accessible for SSH access.
  • EC2 Configuration: Choose a KeyName which is the name of an EC2 KeyPair. Providing the KeyPair information will enable the SSH access to the cluster instances

Choose the MasterInstanceType and WorkerInstanceType suitable for your workload. The default value is r4.xlarge. Larger master instances provide more memory for Alluxio master to store metadata. Worker instance memory space is proportional to the Alluxio worker memory size, which determines how much data can be stored in this worker.

By default, an Alluxio cluster with one master and one worker will be launched. You can choose to enable high availability for the cluster by selecting Yes in the EnableHA field. If this is selected, 3 masters will be provisioned. Specify the WorkersCount which set the number of Alluxio workers for the cluster.

  • Alluxio Configuration: Specify the S3RootMount which is a valid S3 address that will be mounted to Alluxio root. The current user account must have the read/write/list permissions to this S3 address. Alluxio helps cache frequently used S3 data for data applications.

Specify the AlluxioProperties field to provide additional Alluxio site properties. Alluxio CFT only provides the necessary parameters to create an Alluxio cluster. If you desire to fine-tune Alluxio behavior, specify the desired properties in the format of KEY1=VALUE1,KEY2=VALUE2. The specified key-value pairs will be appended to the alluxio-site.properties file in all nodes inside the Alluxio cluster.

Congratulations! All the needed parameters are set, now click the Next button on the bottom right to navigate to the stack options configuration. 


Step 3: Stack options

All the stack options are provided by CloudFormation by default.  These options include adding tags to resources created in the Alluxio cluster stack, choosing the IAM role to limit the permissions available, specifying the stack policy and rollback configuration. For futher inside into those options following the stack options documentation. In the current example, all the options are left for default values. Click the Next button directly.


Step 4: Review and Launch

Finally, review the stack details of your Alluxio cluster. After reviewing the content, press the Create stack to start generating Alluxio cluster. 

Note that just above the final Create stack button there will be a blue box informing you that the Alluxio CloudFormation template requires capabilities to create an IAM role. In order to create the Alluxio cluster, you need to mark the checkbox.

Check the box and click Create stack.

Alluxio cluster is creating now! 

The launched Alluxio cluster stack may take minutes to hours to launch depending on how many nodes this cluster has. The CREATE_COMPLETE status means that the Alluxio cluster is created successfully and is running!


STEP 5: Explore the Alluxio cluster

After the Alluxio cluster is created successfully, we can interact with the cluster now! 

The following instructions are different based on whether or not you choose high availability mode.

Navigate to the Outputs section which shows us the cluster summary. Alluxio cluster summary, SSH to Alluxio master command and master web UI are provided. 

This image has an empty alt attribute; its file name is RWjQtI1tBRUJWC9Ex5DYuk1bpB04KlvxKPOLNNJGn7EM8jQc2eQBvKIMQdaEvGA1ETWNyf0ZrLrPgKmxpV0CWCXiFPBsFvYMC_hv1CFZ7nEZBR4-rJ8Y2JYz8ROEGBqDbDS9an44

Click the Master web UI link, Alluxio cluster summary is shown on the website. Alluxio data, logs, configurations, workers are available in the sub-pages.

This image has an empty alt attribute; its file name is z97FIhJSFdrBU-z2JuqwIOH_O4STZFqEXPI_FSDkickHpSpCvv0E1_EY1PqPQmTgO1jp5SRERLS1yDffHrE_gMT1MFFu3ugsMGTjMT2zs_thN3WQhT6TgN9gIxy-LtfDwrB1agyS

Open a terminal window, ssh to master command can help you establish the connection to the master node.

Three Alluxio master nodes will be launched using the internal leader election. Note that the stack outputs section will be less informative as CloudFormation is not able to get the leader master address. 

Navigate to the EC2 console Instances page, you can see many instances with name <STACK_NAME>-AlluxioMaster/Worker.  Click one of the instance, you can get the public dns and private dns of this node.

Use the following command to ssh into one of the cluster nodes with the public DNS of this node.

$ ssh -i /path/to/your/pem ec2-user@<node_public_dns>

Inside this node, you can run the Alluxio CLI to interact with the Alluxio namespace and mounted under filesystems in the same way as with single master Alluxio clusters. Also, the alluxio fsadmin report command can help you find the private DNS of the leader master

Search the private DNS of the leader master in the EC2 console, and you can get its public DNS.

Launch the Alluxio leader master web UI with address https://<leader_master_public_dns>:19999

After SSHing into any of the cluster nodes, you can explore the current node settings and run the Alluxio CLI to explore the Alluxio cluster.

The Alluxio home directory is /opt/alluxio under which you can view the configuration and logs of the current node. Note that Alluxio is installed and started as the alluxio user. 

The provided S3 address is mounted as the Alluxio root. The alluxio fs mount command provides the mount information. 

$ alluxio fs mount

s3://alluxio-cft-bucket/data  on / (s3, capacity=-1B, used=-1B, not read-only, not shared, properties={})

You can see the data mounted to Alluxio from the Alluxio web UI or by running the alluxio fs ls / command. The following data exist in s3://alluxio-cft-bucket/data and is mounted to Alluxio root.

$ alluxio fs ls /

           3082       PERSISTED 08-27-2019 20:55:55:000   0% /geoMap.csv

           2098       PERSISTED 08-27-2019 20:55:55:000   0% /multiTimeline.csv

            768       PERSISTED 08-27-2019 20:55:55:000   0% /relatedEntities.csv

           1168       PERSISTED 08-27-2019 20:55:46:000   0% /relatedQueries.csv

Run other Alluxio CLI commands to explore the Alluxio cluster!


Advanced usage

Restart Alluxio cluster

To restart the Alluxio cluster, run the following commands in each master node as the alluxio user

$ /opt/alluxio/bin/alluxio-start.sh -a master

$ /opt/alluxio//bin/alluxio-start.sh -a job_master

$ /opt/alluxio//bin/alluxio-start.sh -a proxy


And run the following commands in each worker node as the alluxio user

$ /opt/alluxio/bin/alluxio-start.sh -a worker

$ /opt/alluxio//bin/alluxio-start.sh -a job_worker

$ /opt/alluxio//bin/alluxio-start.sh -a proxy


Launch an Alluxio cluster with spot worker instances

By default, all instances inside the Alluxio cluster will be launched on demand. EC2 instances can also be launched as spot instances, saving a significant portion of the instance cost at the risk of having these instances terminated and reclaimed by EC2 at any time. Alluxio masters are critical to the cluster and should not be launched as spot instances. In contrast, worker instances fit the use case of spot instances because new workers can register themselves to the cluster and old workers will be marked as lost. Because the addition and loss of workers do not affect basic Alluxio functionality, we can support launching Alluxio clusters with spot instances for workers.

Creating an Alluxio cluster with spot worker instances requires a different Alluxio CloudFormation Template. The template url is https://alluxio-public.s3.amazonaws.com/cft/AlluxioCFTSpotInstance.json

The configuration is the same with the addition of the parameter WorkerSpotPrice, which specifies the maximum hourly price that you are willing to pay for spot instances. 

Note that the price should be set according to the worker instance type. If the price is too low, Alluxio workers may not be fully provisioned and the stack will show as CREATE_FAILED.