Alluxio Overview: Open source data orchestration technology

Alluxio is an open source software solution that connects analytics applications to heterogeneous data sources through a data orchestration layer that sits between compute and storage. It runs on commodity hardware, creating a shared data layer abstracting the files or objects in underlying persistent storage systems. Applications connect to Alluxio via a standard interface, accessing data from a single unified source.

This white paper discusses the data center challenges Alluxio addresses, the benefits provided, and an overview of how it works.

Tags: , ,

Alluxio Architecture and Data Flow

Alluxio was created because we saw a need for innovation at the data layer rising from the growing complexity of connecting multiple compute frameworks to an ever-expanding mix of storage systems and formats. Our approach uses a memory-centric architecture that abstracts files and objects in underlying persistent storage systems and provides a shared data access layer for compute applications.

Alluxio is not a persistent storage system. Instead, Alluxio serves as a data access layer, residing between any persistent storage system (such as Amazon S3, Microsoft Azure Object Store, Apache HDFS or OpenStack Swift) and computation frameworks (such as Apache Spark, Presto or Hadoop MapReduce). This whitepaper provides a technical overview of the Alluxio architecture and describes the data flow for common read and write scenarios.

Tags: , , ,

A Case For Packing And Indexing In Cloud File Systems

Small (kilobyte-sized) objects are the bane of highly scalable cloud object stores. Larger (at least megabytesized) objects not only improve performance, but also result in orders of magnitude lower cost, due to the current operation-based pricing model of commodity cloud object stores. For example, in Amazon S3’s current pricing scheme, uploading 1GiB data by issuing … Continued

Tags: , , ,

Alluxio: A Virtual Distributed File System

The world is entering the data revolution era. Along with the latest advancements of the Internet, Artificial Intelligence (AI), mobile devices, autonomous driving, and Internet of Things (IoT), the amount of data we are generating, collecting, storing, managing, and analyzing is growing exponentially. To store and process these data has exposed tremendous challenges and opportunities. … Continued

Tags: , , ,

MOMO: Accelerating Ad Hoc Analysis with Spark SQL and Alluxio

From our friends at MOMO The hadoop ecosystem makes many distributed system/algorithms easier to use and generally lowers the cost of operations. However, enterprises and vendors are never satisfied with that, so higher performance becomes the next issue. We considered several options to address our performance needs and focused our efforts on Alluxio, which improves performance … Continued

Tags: , , , , ,

Structured Big Data Federation Using Alluxio

Enterprises are adopting big data technologies to analyze and derive insight from their growing volumes of structured and unstructured data. A familiar problem is the requirement to analyze data from multiple independent storage silos concurrently. In order to consolidate the data, large enterprises typically use custom solutions or build a data lake. These approaches present additional challenges and can be costly and time consuming.

Tags: , ,

Effective Spark DataFrames with Alluxio

Introduction Many organizations deploy Alluxio together with Spark for performance gains and data manageability benefits. Qunar recently deployed Alluxio in production, and their Spark streaming jobs sped up by 15x on average and up to 300x during peak times. They noticed that some Spark jobs would slow down or would not finish, but with Alluxio, those … Continued

Tags: , , , ,

Cray Analytics and Alluxio – Wrangling Enterprise Storage

For business to not just survive — but to flourish — it’s become imperative to make decisions with near immediacy, continuously pivot strategy and tactics, and merge streams of inquiries into meaningful action. Executing requires high-frequency insights — the competitive advantage in today’s frenetic business landscape. Together with Alluxio, Inc., we enable businesses to gain the … Continued

Enhancing the Value of Alluxio with Samsung NVMe SSDs

Alluxio, formerly Tachyon, is the world’s first system which unifies data at memory speeds while achieving affordability through Alluxio’s innovative tiered storage functionality. This Samsung whitepaper shows how Alluxio’s storage can be used with different storage media available in systems including NVME SSDs while providing in‐line performance consistent with the speed of the underlying storage media. Alluxio provides the capability to leverage all the storage that is available in a system.