Announcing Alluxio Data Orchestration Hub

We’re pleased to announce the general availability of Alluxio Data Orchestration Hub, your single pane of glass to orchestrate data for analytics and AI. The data ecosystem is complex with the separation of storage and compute across data centers and cloud providers. With this release we’ve made great strides towards simplifying data access and management across multiple environments.

Data Orchestration Hub, or the Hub, is a management console that makes it easy to manage an analytics cluster and connect it with multiple data sources to unify data lakes. The service provides an easy to use unified management view for configuration and monitoring, and wizard based curation of deployment workflows.

  • Connect Your Data Sources: Connect Alluxio to data storage and catalogs across multiple clouds, single cloud or on-premises using guided wizards.
  • Monitor Your Alluxio Cluster: Monitor your Alluxio cluster.
  • Manage Configuration: Set and distribute configuration for a cluster.

Alluxio Data Orchestration Hub is available immediately for all Alluxio deployment scenarios with compute engines like Presto, Spark and Tensorflow. The Hub is ready to use out of the box with Amazon EMR and Google Dataproc. Other platforms are also available for use. Please visit the documentation here for more information to try out the Hub.

When to Use

Connecting to data sources across regions

The Hub provides self-guided wizards to allow users to connect to data sources and catalogs in the same or remote data centers. A user is guided through the required configuration steps along with validation of the connection.

These wizards are applicable for multiple scenarios including: hybrid cloud, cross-data center, single cloud or private data center deployments. Manage your compute clusters with Alluxio using these easy-to-use wizards.

Connect Alluxio to all your data sources across multiple clouds, single cloud or on-premises using self-guided wizards.

Managing an Alluxio cluster

The Hub can be used to view a dashboard to monitor the state of processes on the cluster, as well as update configuration and restart processes. This is especially useful for cloud deployments without access to SSH for configuration and process management.

Monitor the status of an Alluxio cluster anywhere. You can start or stop cluster components from an intuitive UI.

What’s Next

To start using Alluxio Data Orchestration Hub, simply launch Alluxio enabled clusters in your on-premises or cloud deployment. Further changes and monitoring of the cluster is managed can now be managed using the Hub: 

  • Process Management: Monitor status of each process part of the Alluxio cluster, and start / stop processes.
  • Connect Data Storage: Connect Alluxio to your data sources, such as HDFS / S3 / GCS, across a hybrid cloud, single cloud or on-premises.
  • Connect Data Catalog: Configure structured data catalogs for OLAP engines like Presto on Alluxio. Connect to existing catalog definitions to prevent re-definition of table metadata.
  • Advanced Configuration: Customize your Alluxio cluster with advanced options for setting and distributing configuration from the central console.

If you would like more information on Data Orchestration Hub and the supported toolset please read the release notes.

Have questions? Come join the Community Slack Channel.

Read the Alluxio 2.4 release product blog to learn more about the expanded features and capabilities to advance analytics and AI in the cloud.