Alluxio is an open source software solution that connects analytics applications to heterogeneous data sources through a data orchestration layer that sits between compute and storage. It runs on commodity hardware, creating a shared data layer abstracting the files or objects in underlying persistent storage systems. Applications connect to Alluxio via a standard interface, accessing data from a single unified source. This white paper discusses the data center challenges Alluxio addresses, the benefits provided, and an overview of how it works.
Tag: compute storage separation
Alluxio was created because we saw a need for innovation at the data layer rising from the growing complexity of connecting multiple compute frameworks to an ever-expanding mix of storage systems and formats. Our approach uses a memory-centric architecture that abstracts files and objects in underlying persistent storage systems and provides a shared data access … Continued
Learn how Intel uses Alluxio to accelerate big data analytics in the cloud, as well as new opportunities with persistent memory with separated compute and storage.
Learn more about data unification for the digital economy and how Alluxio’s data orchestration brings your data to your compute, wherever it’s located.
JD.com is China’s largest online retailer and its biggest overall retailer, as well as the country’s biggest internet company by revenue. Currently, JD.com’s BDP platform runs more than 400,000 jobs (15+ PB) daily, on a system with more than 15,000 cluster nodes and a total capacity of 210 PB.
Alluxio has run in JD.com’s production environment on 100 nodes for six months. See how JD.com uses Alluxio to provide support for ad hoc and real-time stream computing, using Alluxio-compatible HDFS URLs and Alluxio as a pluggable optimization component.
TalkingData’s largest data broker, provides data intelligence solutions and processes over 20 terabytes of data and more than one billion session requests per day. TalkingData deployed Alluxio to unify disparate cloud, on-premise, and hybrid data sources for a range of analytics applications. The architecture provides self-service data access for data scientists and engineers, eliminating the … Continued
Alluxio presents a set of disparate data stores as a single file system, greatly reducing the complexity of storage APIs, and semantics exposed to applications. Alluxio is designed with a memory centric architecture, enabling applications to leverage memory speed I/O by simply using Alluxio. Alluxio has been deployed at hundreds of leading companies in production, … Continued
Enabling Decoupled Compute and Storage with Alluxio This blog explores the benefits Alluxio brings to data platforms, including: The trends behind the rise of decoupled compute-storage architectures How Alluxio addresses data access issues for decoupled compute-storage architectures An example of Alluxio’s benefits using a SparkSQL workload Motivation The primary appeal of a coupled compute-storage architecture, … Continued
Faster On-Demand Clusters This is an excerpt from the Accelerating On-Demand Data Analytics with Alluxio whitepaper, which includes a detailed implementation guide in addition to this high level overview. In the Big Data world, it is often the case that only a subset of the total data is relevant for answering the question at hand. As a … Continued