Whitepaper: Using Alluxio to Improve the Performance and Consistency of HDFS Clusters

Alluxio is the world’s first memory-speed virtual distributed storage system that bridges applications and underlying storage systems, providing unified data access orders of magnitudes faster than existing solutions. The Hadoop Distributed File System (HDFS) is a distributed file system for storing large volumes of data. HDFS popularized the paradigm of bringing computation to data and … Continued

Whitepaper: Accelerating On-Demand Data Analytics with Alluxio

This whitepaper consists of two portions. The first is a high level overview of the advantages of using Alluxio as a core technology with on-demand clusters. The second portion is intended for engineers; it provides a detailed step-by-step guide to deploying an on-demand cluster with Alluxio and instructions for running a sample workload on the cluster. At the end of the paper you will have a good understanding of how to deploy this architecture and the value Alluxio brings to the stack.

  • Memory speed data access.
  • Efficient data sharing between applications.
  • Transparent data access to storage systems.
  • Reduced memory footprint.

Unified Namespace: Allowing Applications To Access Data Anywhere

Introduction The exponential growth of the raw computational power, communication bandwidth, and storage capacity results in continuous innovation in how data is processed and stored. To address the evolving nature of the compute and storage landscape, we are continuously advancing Alluxio, a state-of-the-art memory-centric virtual distributed storage system. This blog post highlights unified namespace, an … Continued

Tachyon: Reliable, Memory Speed Storage For Cluster Computing Frameworks

Tachyon is a distributed file system enabling reliable data sharing at memory speed across cluster computing frameworks. While caching today improves read workloads, writes are either network or disk bound, as replication is used for fault-tolerance. Tachyon eliminates this bottleneck by pushing lineage, a well-known technique, into the storage layer. The key challenge in making … Continued

Reliable, Memory Speed Storage For Cluster Computing Frameworks

Tachyon is a distributed file system enabling reliable data sharing at memory speed across cluster computing frameworks. While caching today improves read workloads, writes are either network or disk bound, as replication is used for fault-tolerance. Tachyon eliminates this bottleneck by pushing lineage, a well-known technique borrowed from application frameworks, into the storage layer. The … Continued

Tachyon: Memory Throughput I/O For Cluster Computing Frameworks

As ever more big data computations start to be in-memory, I/O throughput dominates the running times of many workloads. For distributed storage, the read throughput can be improved using caching, however, the write throughput is limited by both disk and network bandwidth due to data replication for fault-tolerance. This paper proposes a new file system … Continued