Apache Spark DataFrame caching with Alluxio

Many organizations deploy Alluxio together with Spark for performance gains and data manageability benefits. Qunar recently deployed Alluxio in production, and their Spark streaming jobs sped up by 15x on average and up to 300x during peak times. They noticed that some Spark jobs would slow down or would not finish, but with Alluxio, those jobs could finish quickly. In this blog post, we investigate how Alluxio helps Spark be more effective. Alluxio increases performance of Spark jobs, helps Spark jobs perform more predictably, and enables multiple Spark jobs to share the same data from memory.

Tags: , , , ,

Cray Analytics and Alluxio – Wrangling Enterprise Storage

For business to not just survive — but to flourish — it’s become imperative to make decisions with near immediacy, continuously pivot strategy and tactics, and merge streams of inquiries into meaningful action. Executing requires high-frequency insights — the competitive advantage in today’s frenetic business landscape. Together with Alluxio, Inc., we enable businesses to gain the … Continued

Enhancing the Value of Alluxio with Samsung NVMe SSDs

Alluxio, formerly Tachyon, is the world’s first system which unifies data at memory speeds while achieving affordability through Alluxio’s innovative tiered storage functionality. This Samsung whitepaper shows how Alluxio’s storage can be used with different storage media available in systems including NVME SSDs while providing in‐line performance consistent with the speed of the underlying storage media. Alluxio provides the capability to leverage all the storage that is available in a system.

Whitepaper: Using Alluxio to Improve the Performance and Consistency of HDFS Clusters

Alluxio is the world’s first memory-speed virtual distributed storage system that bridges applications and underlying storage systems, providing unified data access orders of magnitudes faster than existing solutions. The Hadoop Distributed File System (HDFS) is a distributed file system for storing large volumes of data. HDFS popularized the paradigm of bringing computation to data and … Continued

Whitepaper: Accelerating On-Demand Data Analytics with Alluxio

This whitepaper consists of two portions. The first is a high level overview of the advantages of using Alluxio as a core technology with on-demand clusters. The second portion is intended for engineers; it provides a detailed step-by-step guide to deploying an on-demand cluster with Alluxio and instructions for running a sample workload on the cluster. At the end of the paper you will have a good understanding of how to deploy this architecture and the value Alluxio brings to the stack.

  • Memory speed data access.
  • Efficient data sharing between applications.
  • Transparent data access to storage systems.
  • Reduced memory footprint.

Unified Namespace: Allowing Applications To Access Data Anywhere

Introduction The exponential growth of the raw computational power, communication bandwidth, and storage capacity results in continuous innovation in how data is processed and stored. To address the evolving nature of the compute and storage landscape, we are continuously advancing Alluxio, a state-of-the-art memory-centric virtual distributed storage system. This blog post highlights unified namespace, an … Continued

Tachyon: Reliable, Memory Speed Storage For Cluster Computing Frameworks

Tachyon is a distributed file system enabling reliable data sharing at memory speed across cluster computing frameworks. While caching today improves read workloads, writes are either network or disk bound, as replication is used for fault-tolerance. Tachyon eliminates this bottleneck by pushing lineage, a well-known technique, into the storage layer. The key challenge in making … Continued

Reliable, Memory Speed Storage For Cluster Computing Frameworks

Tachyon is a distributed file system enabling reliable data sharing at memory speed across cluster computing frameworks. While caching today improves read workloads, writes are either network or disk bound, as replication is used for fault-tolerance. Tachyon eliminates this bottleneck by pushing lineage, a well-known technique borrowed from application frameworks, into the storage layer. The … Continued