Moving From Apache Thrift to gRPC: A Perspective From Alluxio

As part of the Alluxio 2.0 release, we have moved our RPC framework from Apache Thrift to gRPC. In this article, we will talk about the reasons behind this change as well as some lessons we learned along the way. Alluxio is an open-source distributed virtual file system, acting as the data access layer that enables bigdata and … Continued

Unified Data Access In Virtual Reality

In a recent blog, we discussed the ideation, design and new features in Alluxio 2.0 preview. Today we are thrilled to announce another new revolutionary project that the Alluxio engineering team has been hard at work on for the past year – the Alluxio Virtual Reality (VR) client. One of the biggest obstacles for new Alluxio users … Continued

Testing Distributed Systems at 1000+ node Scale for the Cost of a Large Pizza, and yes, on AWS!

Testing distributed systems at scale is typically a costly yet necessary process. At Alluxio we take testing very seriously as organizations across the world rely on our technology, therefore, a problem we want to solve is how to test at scale without breaking the bank. In this blog we are going to show how the … Continued

Deploying Big Data Workloads on Object Storage Without Performance Penalty

Introduction As the amount of data being collected and analyzed by Enterprises continues to grow unabated, more attention is being placed on managing the cost of storing the data relative to performance. Hadoop provides a scalable and fast way of storing and analyzing data, however, the cost of storing data in Hadoop is typically higher … Continued

Asynchronous Caching in Alluxio – High Performance for Partial Read Caching

Overview An Alluxio cluster caches data from connected storage systems in memory to create a data layer that can be accessed concurrently by multiple application frameworks. This greatly improves performance for many analytics workloads. On-demand caching occurs when clients read blocks of data using a ‘CACHE’ read type from persistent storage systems connected to the … Continued

New Whitepaper: Structured Big Data Federation

Enterprises are adopting big data technologies to analyze and derive insight from their growing volumes of structured and unstructured data. A familiar problem is the requirement to analyze data from multiple independent storage silos concurrently. In order to consolidate the data, large enterprises typically use custom solutions or build a data lake. These approaches present … Continued

Enabling Decoupled Compute and Storage with Alluxio

Enabling Decoupled Compute and Storage with Alluxio This blog explores the benefits Alluxio brings to data platforms, including: The trends behind the rise of decoupled compute-storage architectures How Alluxio addresses data access issues for decoupled compute-storage architectures An example of Alluxio’s benefits using a SparkSQL workload Motivation The primary appeal of a coupled compute-storage architecture, … Continued