Serving Structured Data in Alluxio: Example

This article goes through a simple example to illustrate how Structured Data Management available in the latest Alluxio 2.1.0 release to help SQL and structured data workloads.

In the previous article, we described the new Structured Data Management in the latest Alluxio 2.1.0 release. In this article, we will use an example to demonstrate how it works on your local laptop.

Step1: Download and Setup Alluxio

Download and Deploy Alluxio 2.1.0

The first step is to download the Alluxio 2.1.0 release. Deploy Alluxio following the guideline. It is as simple as running the command:

$ ./bin/alluxio-start.sh local -f

After Alluxio is up, the new Structured Data Service will be running without any other configuration.

Install and Configure the Alluxio Presto Connector

The Alluxio Presto Connector is the client for Presto to access Alluxio Structured Data Management. Presto will interact with other Alluxio services through this Alluxio connector. In this developer preview version, it is required to be installed manually to Presto deployment. 

This connector is bundled in the Alluxio 2.1.0 release in the directory: ${ALLUXIO_HOME}/client/presto/plugins/. In order to install the Alluxio Presto connector, the connector directory must be copied into your Presto installation on all your Presto nodes. Here is an example:

$ cp -R ${ALLUXIO_HOME}/client/presto/plugins/presto-hive-alluxio-319/ \

${PRESTO_HOME}/plugin/hive-alluxio/

Once the connector is installed, it can be used to configure a Presto catalog. Add a new catalog configuration to Presto. For example, add a new file:

${PRESTO_HOME}/etc/catalog/catalog_alluxio.properties

With the contents:

connector.name=hive-alluxio

hive.metastore=alluxio

hive.metastore.alluxio.master.address=HOSTNAME:PORT

Once the connector is installed and configured, the Presto server must be restarted.

Step2: Attach a Hive Metastore to Alluxio Catalog Service

Alluxio Catalog Service manages the metadata of structured data in the system. It is responsible for all the database, table, and schema information, as well as the location of all the stored data. This developer preview version supports to attach a Hive Metastore as an “UnderDatabase” (an abstraction of other external catalogs and databases) into the Alluxio Catalog service.

Attaching a Hive Metastore to the Alluxio Catalog Service is easy. Simply run the “attachdb” command-line tool to attach to the Alluxio catalog. For example, in the Alluxio installation directory:

$ ./bin/alluxio table attachdb hive thrift://HOSTNAME:9083 hive_db_name

This command will attach the database name “hive_db_name” from the Hive metastore found at thrift://HOSTNAME:9083. After attachdb completes, Alluxio catalog service 

See the documentation for the attachdb command for more information.

Step3: Use Alluxio Structured Data Management with Presto

Once a database is attached, the catalog service can be used from Presto. Using the previously configured Alluxio catalog, you can start the presto CLI with the Alluxio catalog:

$ presto --catalog catalog_alluxio

Within the Presto CLI, you can run various presto commands and queries which access the Alluxio Catalog Service, via the Alluxio Presto connector. The Alluxio Catalog Service will automatically serve the table information from Hive metastore, while transparently using the Alluxio mounted locations.

Transform a Table

Some tables may be not be stored in a compute-optimized way. If there are any tables which are stored with too many files, or in the CSV format, you can use the Alluxio Transformation Service to transform the data into a more compute-optimized representation. For example, in the Alluxio installation directory:

$ ./bin/alluxio table transform hive_db_name test_table

This command will initiate a transformation to optimize the “test_table” table name from the database “hive_db_name”. Once the transformation is complete, the metadata will be automatically updated in the Alluxio Catalog Service.

See the documentation on the transformation service for more information.

Try it out!

We are happy to introduce the developer preview of Alluxio Structured Data Management! This is an exciting, new effort for Alluxio to provide further benefits, especially for SQL frameworks. Get started with Alluxio Structured Data Management with Presto, and we would appreciate any feedback for features and issues in the Alluxio Github repository! If you see any issues, feel free to ask questions in our Alluxio community slack channel.