How do you orchestrate data between disparate storages?

As the data ecosystem within enterprises grow larger and larger, not only do we see an increase in total data volumes but also an increase in the disparate storage systems in which they are housed. The challenge then becomes how do different applications and teams have an efficient way of being able to access data across these various sub systems without having to spend a nontrivial amount of time on building data access workflows.

Data Orchestration with Alluxio

Over time as the data ecosystem matures, we see one trend that is commonly repeated. When developers and teams cannot get reliable performance and access they often resort to creating a new computation framework, a new storage system, and ultimately an entirely new stack. While this may serve as a temporary solution over time it only stands to magnify the issue at hand.

Alluxio realizes that to solve this problem there becomes a need to have a new layer, that of a data orchestration layer. A data orchestration platform that sits in between compute and storage. The need for this layer becomes more evident when we realize that we need to ensure a way for our environment to run agnostic of compute, storage, or cloud.

Through an easy to use and feature rich set of commands the Alluxio command line is also extremely comfortable to navigate for any developers that have used a posix environment previously.

alluxio@ip-172-31-38-93 alluxio]$ ./bin/alluxio fs ls /

drwxr-xr-x alluxio        alluxio  24 PERSISTED 05-28-2019 12:33:47:944  DIR /default_tests_files

drwxr-xr-x alluxio        alluxio   3 NOT_PERSISTED 05-28-2019 17:49:30:084  DIR /hdfs

drwxr-xr-x alluxio        alluxio   4 NOT_PERSISTED 05-28-2019 17:49:33:295  DIR /s3

drwxrwxrwx alluxio        alluxio   2 PERSISTED 05-28-2019 16:40:52:914  DIR /testdfsio

In the above we can see that through an ‘ls’ command we are now able to also see multiple different storage directories that we can access through Alluxio. These storages are effectively ‘mounted’ into the Alluxio namespace using an Alluxio fs mount command.

This idea of virtualization allows developers and applications teams to no longer need to worry about which compute or storage they are using. As long as Alluxio is able to speak an interface compatible with either the compute framework or storage then it can integrate seamlessly into any environment.