Bay Area Meetup 2018 – Alluxio is the data orchestration layer between storage and compute, bringing your data closer to your Presto workloads for better performance on top of S3.
Tag: compute storage separation
The cloud is rapidly becoming ubiquitous, with continued adoption focused on the flexibility and cost benefits of a utility infrastructure model. Enterprises are increasingly taking a “data first” view of infra- structure, which demands a new way of thinking in a world in which data is stored and accessed from multiple locations and providers. Performance and interoperability challenges, however, can present obstacles to cloud adoption and complicate data management. Techniques such as the use of data silos, ETL processes and multiple data copies, which are commonly employed to accommodate cloud limitations, often tend to offset the expected benefits of cloud infrastructure. Alluxio offers a new way to enhance the benefits of cloud infra- structure without the performance limitations or interoperability challenges resulting from accessing disparate data sources in multiple, often remote, locations.
Alluxio is an open source software solution that connects analytics applications to heterogeneous data sources through a data orchestration layer that sits between compute and storage. It runs on commodity hardware, creating a shared data layer abstracting the files or objects in underlying persistent storage systems. Applications connect to Alluxio via a standard interface, accessing data from a single unified source.
This white paper discusses the data center challenges Alluxio addresses, the benefits provided, and an overview of how it works.
Alluxio was created because we saw a need for innovation at the data layer rising from the growing complexity of connecting multiple compute frameworks to an ever-expanding mix of storage systems and formats. Our approach uses a memory-centric architecture that abstracts files and objects in underlying persistent storage systems and provides a shared data access layer for compute applications.
Alluxio is not a persistent storage system. Instead, Alluxio serves as a data access layer, residing between any persistent storage system (such as Amazon S3, Microsoft Azure Object Store, Apache HDFS or OpenStack Swift) and computation frameworks (such as Apache Spark, Presto or Hadoop MapReduce). This whitepaper provides a technical overview of the Alluxio architecture and describes the data flow for common read and write scenarios.
Learn how Intel uses Alluxio to accelerate big data analytics in the cloud, as well as new opportunities with persistent memory with separated compute and storage.
Learn more about data unification for the digital economy and how Alluxio’s data orchestration brings your data to your compute, wherever it’s located.
Strata NY 2018 – Learn how to use Alluxio as a pluggable optimization component. Understand how JD.com uses Alluxio to provide support for ad hoc and real-time stream computing while ensuring consistency between Alluxio and HDFS.
Alluxio is an open source software solution that connects analytics applications to heterogeneous data sources through a data orchestration layer that sits between compute and storage.
Learn more about use cases with Alluxio leveraged in MOMO, JD.com, and TalkingData.