In this article, you will learn how to incorporate Alluxio to implement a unified distributed file system service as well as how to add extensions on top of Alluxio including customized authentication schemes and UDF (user-defined functions) on Alluxio files.
Tag: distributed systems
Grafana, a comprehensive metrics visualization software, ties into this process by pulling the metrics that systems like Alluxio collect through a sink and visualizes them in a more helpful fashion. This guide will cover how to set up Grafana and Graphite, a supported sink for Alluxio that will put metrics in a time-series database, along with exploring some of the possibilities that the combination offers.
This Alluxio Meetup features a chance to interact with other Alluxio users and developers, as well as three talks. Thanks to our joint host Data Council!
In this talk, we will focus on Alluxio design, its architecture, data flow and metadata flow. We will dive into the choices in its design space and share the experiences when implementing features like data tiering, storage options and cache eviction policies. We will also share our lessons in design, implementation and operation when working to build an open source distributed storage systems with 900 contributors for 5+ years.
Enterprises are increasingly looking towards object stores to power their big data & machine learning workloads in a cost-effective way. The combination of SwiftStack and Alluxio together, enables users to seamlessly move towards a disaggregated architecture.
Tachyon is a memory-centric fault-tolerant distributed storage system, which enables reliable file sharing at memory-speed. It originated from AMPLab, UC Berkeley in 2012, the same lab produced Apache Mesos and Apache Spark. Soon later, it became an open source project and is deployed at many companies. Since then, Tachyon has attracted more than 200 contributors from over 50 institutions. In 2015, company Tachyon Nexus was founded to further accelerate the development of Tachyon. In this talk, we will review Tachyon’s new features, deployments, and developments in 2015, and look into 2016.
During the past several years, Spark has significantly changed the landscape of big data computing. It improves performance of various applications dramatically. However, in certain Spark use cases, the bottleneck is in the I/O stack. In this talk, we will introduce Tachyon, a distributed memory-centric storage system. In addition, we will talk about several production use cases where Tachyon further improves Spark applications’ performance by orders of magnitude.
In the presentation, we will explore several potential industry use cases enabled by the new features. One-click cluster deployment enables users to experiment and prototype with Tachyon on AWS, launching not only Tachyon but also the computation framework and storage system of their choice. Mounting of multiple under storage systems and transparent naming enables more exciting use cases for Tachyon users.
Calvin Jia and Jiri Simsa explain how the current Alluxio tiered storage can be easily configured to use memory, SSDs, and hard drives in different tiers. Alluxio users and administrators do not have to manually migrate the data because data in Alluxio is managed transparently between all the configured tiers, similar to the way the CPU manages L1, L2, and lower-level caches. Meanwhile, Alluxio also provides users fine-grained control of manipulating data to plug in their own data-management strategies; users can also pin files in Alluxio to a specific storage or specify a TTL to files. Calvin and Jiri also describe the interface for managing heterogeneous data sources into the Alluxio namespace, which takes advantage of Alluxio’s ability to interoperate with different underlying storage systems such as HDFS, S3, GlusterFS, or Swift.