Alluxio Kubernetes Operator Tutorial: Simplifying Deploying and Managing Alluxio Clusters

This blog provides a tutorial on using the Kubernetes operator to simplify deploying and managing Alluxio clusters on Kubernetes.

Introduction

The Alluxio Kubernetes operator makes deploying and managing Alluxio and the datasets on Kubernetes easier. With the operator, Alluxio clusters can be deployed and managed seamlessly like any other native Kubernetes application.

The operator handles common tasks like provisioning pods, configuring services, mounting storage volumes and load datasets. This automation simplifies operations and reduces the effort required to run Alluxio on Kubernetes, cutting operational costs.

The on-demand data loading enabled by the operator, via kubectl commands, allows users to load data into Alluxio only when needed. This reduces instance costs (such as EC2 costs) by avoiding storing unused data in Alluxio.

This blog provides a tutorial on deploying Alluxio on Kubernetes with Operator. You will learn the following step-by-step:

  • Install and deploy the Kubernetes Operator of Alluxio
  • Deploy and maintain dataset
  • Deploy Alluxio with Kubernetes Operator
  • Load data into Alluxio
  • Uninstall and clean up Alluxio and dataset

Prerequisites

  • A Kubernetes cluster with version at least 1.19, with feature gate enabled.
  • Cluster access to an Alluxio Docker image alluxio/alluxio or download an image tarball of Alluxio.
  • Ensure the cluster’s Kubernetes Network Policy allows for connectivity between applications (Alluxio clients) and the Alluxio Pods on the defined ports.
  • The control plane of the Kubernetes cluster has helm 3 with version at least 3.6.0 installed.
  • You will need certain RBAC permission in the Kubernetes cluster to make Operator to work.
    • Permission to create CRD (Custom Resource Definition);
    • Permission to create ServiceAccount, ClusterRole, and ClusterRoleBinding for the operator pods;
    • Permission to create a namespace that the operator will be in.

Deploy Alluxio Kubernetes Operator

You will use the Helm Chart for deploying the Alluxio Kubernetes operator. Follow the steps below:

Download Alluxio Kubernetes Operator

Download the Alluxio Kubernetes Operator here https://github.com/Alluxio/k8s-operator and enter the root directory of the project.

Install Operator

Install the operator by running:

$ helm install operator ./deploy/charts/alluxio-operator

Operator will automatically create a namespace `alluxio-operator` and install all the components there.

Run Operator

Run the cmd below

$ kubectl get pods -n alluxio-operator

to make sure all pods of the operator are running as expected.

Deploy Dataset

Create Dataset Configuration

Create a dataset configuration dataset.yaml. Its apiVersion must be `k8s-operator.alluxio.com/v1alpha1` and `kind` must be `Dataset`. Here is an example:

apiVersion: k8s-operator.alluxio.com/v1alpha1
kind: Dataset
metadata:
  name: my-dataset
spec:
  dataset:
    path: <path of your dataset>
    credentials:
      - <property 1 for accessing your dataset>
      - <property 2 for accessing your dataset>
      - ...

Deploy Dataset

Deploy your dataset by running

$ kubectl create -f dataset.yaml

Check Status of Dataset

Check the status of the dataset by running

$ kubectl get dataset <dataset-name>

Deploy Alluxio

Prepare Resource Configuration File

Prepare a resource configuration file alluxio-config.yaml. Its `apiVersion` must be k8s-operator.alluxio.com/v1alpha1 and `kind` must be AlluxioCluster. Here is an example:

apiVersion: k8s-operator.alluxio.com/v1alpha1
kind: AlluxioCluster
metadata:
  name: my-alluxio-cluster
spec:
  dataset: my-dataset  # dataset name is required
  worker:
    count: 4
  pagestore:
    type: hostPath
    quota: 512Gi
    hostPath: /mnt/alluxio
  fuse:
    enabled: true

All configurable properties in the spec section can be found in deploy/charts/alluxio/values.yaml.

Deploy Alluxio Cluster

Deploy Alluxio cluster by running:

$ kubectl create -f alluxio-config.yaml

Check Status of Alluxio Cluster

Check the status of Alluxio cluster by running:

$ kubectl get alluxiocluster <alluxio-cluster-name>

Load the Data into Alluxio

To load your data into Alluxio cluster, so that your application can read the data faster, create a resource file load.yaml. Here is an example:

apiVersion: k8s-operator.alluxio.com/v1alpha1
kind: Load
metadata:
  name: my-load
spec:
  dataset: my-dataset
  path: /

Then run the following command to start the load:

$ kubectl create -f load.yaml 

To check the status of the load:

$ kubectl get load

Uninstall

Run the following command to uninstall Dataset and Alluxio cluster:

$ kubectl delete dataset <dataset-name>
$ kubectl delete alluxiocluster <alluxio-cluster-name>

Summary

Through this tutorial, you have learned how to leverage the operator to simplify deploying and managing Alluxio on Kubernetes.

To learn more about Alluxio, join 11k+ members in the Alluxio community slack channel to ask any questions and provide feedback.