Moving From Apache Thrift to gRPC: A Perspective From Alluxio

As part of the Alluxio 2.0 release, we have moved our RPC framework from Apache Thrift to gRPC. In this article, we will talk about the reasons behind this change as well as some lessons we learned along the way. Alluxio is an open-source distributed virtual file system, acting as the data access layer that enables bigdata and … Continued

Unified Big Data Analytics: Any Stack, Any Cloud

This presentation focuses on how Alluxio enables the big data analytics stack to be cloud-native. Today’s cloud object storage systems provide more cost-effective and scalable storage solutions but also different semantics and performance implications compared to HDFS. Applications like Spark or Presto will not benefit from the node-level locality or cross-job caching when retrieving data from the cloud object storage. Deploying Alluxio to access cloud solves these problems because data will be retrieved and cached in Alluxio instead of the underlying cloud or object storage repeatedly.

Tags: , , , ,

Providing a Unified Data Layer at Memory Speed for Cloud Environments with Huawei and Alluxio

The cloud is rapidly becoming ubiquitous, with continued adoption focused on the flexibility and cost benefits of a utility infrastructure model. Enterprises are increasingly taking a “data first” view of infra- structure, which demands a new way of thinking in a world in which data is stored and accessed from multiple locations and providers. Performance and interoperability challenges, however, can present obstacles to cloud adoption and complicate data management. Techniques such as the use of data silos, ETL processes and multiple data copies, which are commonly employed to accommodate cloud limitations, often tend to offset the expected benefits of cloud infrastructure. Alluxio offers a new way to enhance the benefits of cloud infra- structure without the performance limitations or interoperability challenges resulting from accessing disparate data sources in multiple, often remote, locations.

Tags: , , , ,

Alluxio Overview

Alluxio is an open source software solution that connects analytics applications to heterogeneous data sources through a data orchestration layer that sits between compute and storage. It runs on commodity hardware, creating a shared data layer abstracting the files or objects in underlying persistent storage systems. Applications connect to Alluxio via a standard interface, accessing data from a single unified source. This white paper discusses the data center challenges Alluxio addresses, the benefits provided, and an overview of how it works.

Tags: , ,

Alluxio Architecture and Data Flow

Alluxio was created because we saw a need for innovation at the data layer rising from the growing complexity of connecting multiple compute frameworks to an ever-expanding mix of storage systems and formats. Our approach uses a memory-centric architecture that abstracts files and objects in underlying persistent storage systems and provides a shared data access … Continued

Tags: , , ,