Alluxio (formerly Tachyon): Unified Namespace and Tiered Storage
We know that storage resources are not all equal, but making distributed big data applications to take advantage of—or even simply understand—this difference is very difficult. Alluxio Inc. has developed Alluxio unified namespace and tiered storage to address this problem, combining two simple yet highly effective ideas:
- In addition to memory, Alluxio manages additional resources such as SSDs and migrates data among different storage types to provide a much bigger capacity to computation frameworks with close-to-optimal throughput.
- Alluxio provides a unified namespace which makes it possible to store, access, and manage data from different and heterogeneous data sources using a single namespace.
Calvin Jia and Jiri Simsa explain how the current Alluxio tiered storage can be easily configured to use memory, SSDs, and hard drives in different tiers. Alluxio users and administrators do not have to manually migrate the data because data in Alluxio is managed transparently between all the configured tiers, similar to the way the CPU manages L1, L2, and lower-level caches. Meanwhile, Alluxio also provides users fine-grained control of manipulating data to plug in their own data-management strategies; users can also pin files in Alluxio to a specific storage or specify a TTL to files. Calvin and Jiri also describe the interface for managing heterogeneous data sources into the Alluxio namespace, which takes advantage of Alluxio’s ability to interoperate with different underlying storage systems such as HDFS, S3, GlusterFS, or Swift. Once a data source is mounted, operations such as creation, deletion, or renaming on objects in the Alluxio namespace are transparently mapped onto the corresponding objects in the namespace of the underlying storage system. Furthermore, information about mounted data sources is managed centrally by the Alluxio master serviced, facilitating reconfiguration.