Products
Serving Structured Data in Alluxio: Example
March 11, 2020
In the previous article, I described the concept and design of the Structured Data Service in the Alluxio 2.1.0 release. This article will go through an example to demonstrate how it helps SQL and structured data workloads.
Alluxio 2.2.0 is now released since the previous article. I recommend users to update to Alluxio 2.2.0 if trying out this service for the first time. This tutorial requires you have Presto and Hive to be configured together and running.
Step1: Download and Setup Alluxio
Download and Deploy Alluxio 2.2.0
Download the Alluxio 2.2.0 release and deploy Alluxio on your local computer. Detailed instructions can be found here. The following is a summary of the commands mentioned:
$ tar xf alluxio-2.2.0-bin.tar.gz$ cd alluxio-2.2.0 # this directory corresponds to ${ALLUXIO_HOME}$ cp conf/alluxio-site.properties.template conf/alluxio-site.properties
$ echo "alluxio.master.hostname=localhost" >> conf/alluxio-site.properties
$ echo "alluxio.master.mount.table.root.ufs=/tmp" >> conf/alluxio-site.properties$ ./bin/alluxio-mount.sh SudoMount$ ./bin/alluxio format$ ./bin/alluxio-start.sh local -f
Note that no additional configuration is needed to start the new Structured Data Service.
Install and Configure the Alluxio Presto Connector
The Alluxio Presto Connector is the client for Presto to access Alluxio’s Structured Data Service. In this developer preview version, we need to copy the connector manually to Presto
This connector is bundled as part of the Alluxio 2.2.0 release in the directory ${ALLUXIO_HOME}/client/presto/plugins/. Copy the directory corresponding to the Presto version into Presto’s plugin directory.
$ cp -R ${ALLUXIO_HOME}/client/presto/plugins/presto-hive-alluxio-319/ \${PRESTO_HOME}/plugin/hive-alluxio/
Once the connector is installed, it can be used to configure a Presto catalog. Add a new catalog configuration to Presto by creating the following file
$ echo "connector.name=hive-alluxiohive.metastore=alluxiohive.metastore.alluxio.master.address=localhost:19998" > ${PRESTO_HOME}/etc/catalog/catalog_alluxio.properties
Restart the Presto server for the connector and configuration to take effect.
Step2: Attach a Hive Metastore to Alluxio Catalog Service
The Alluxio Catalog Service manages the metadata of structured data components such as databases, tables, and schemas. It also tracks the location of the stored data. This developer preview version supports attaching a Hive Metastore as an UnderDatabase, which is an abstraction of other external catalogs and databases, into the Alluxio Catalog service.
To attach the Hive Metastore into the Alluxio Catalog Service, use the “attachdb” command here:
$ ./bin/alluxio table attachdb hive thrift://localhost:9083 hive_db_name
Step3: Use Alluxio Structured Data Management with Presto
Once a database is attached, the catalog service can be used from Presto. Start the Presto CLI with the Alluxio catalog:$ presto --catalog catalog_alluxioAny queries run within this CLI will access the Alluxio Catalog Service via the provided connector. The Alluxio Catalog Service will automatically serve the table information from Hive metastore, while transparently using the Alluxio mounted locations.
Transform a Table
Data transformations is a key benefit of working with structured data in Alluxio, particularly when the underlying files consisting of a table are not stored in a compute-optimized fashion. If the files are in CSV format or the table is split among lots of small files, the Alluxio Transformation Service is able to convert the format to parquet or join multiple small files into larger files.To transform the test table in Hive:
$ ./bin/alluxio table transform hive_db_name test_table
For more on Data Transformations, see documentation here.
Try it out!
Alluxio Structured Data Management is an exciting, new effort that provides further benefits for SQL frameworks. Get started with Alluxio Structured Data Service with Presto and let us know if you have any feedback for features and issues in the Alluxio Github repository! On behalf of the entire Alluxio open source community, I invite you to ask questions in our community slack channel whenever you encounter any issues.
.png)
Blog

Make Multi-GPU Cloud AI a Reality
If you’re building large-scale AI, you’re already multi-cloud by choice (to avoid lock-in) or by necessity (to access scarce GPU capacity). Teams frequently chase capacity bursts, “we need 1,000 GPUs for eight weeks,” across whichever regions or providers can deliver. What slows you down isn’t GPUs, it’s data. Simply accessing the data needed to train, deploy, and serve AI models at the speed and scale required – wherever AI workloads and GPUs are deployed – is in fact not simple at all. In this article, learn how Alluxio brings Simplicity, Speed, and Scale to Multi-GPU Cloud deployments.

Alluxio's Strong Q2: Sub-Millisecond AI Latency, 50%+ Customer Growth, and Industry-Leading MLPerf Results
Alluxio's strong Q2 featured Enterprise AI 3.7 launch with sub-millisecond latency (45× faster than S3 Standard), 50%+ customer growth including Salesforce and Geely, and MLPerf Storage v2.0 results showing 99%+ GPU utilization, positioning the company as a leader in maximizing AI infrastructure ROI.
Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer
No items found.
