In this meetup, Dipti and HY will present a new approach to hybrid analytical workloads using Alluxio, an open source data orchestration layer, which sits between compute and storage layer. Applications like Apache Spark or TensorFlow can then seamlessly access multiple disparate data sources with consistent performance using data locality and abstraction that the data orchestration tier brings.
Tag: cloud storage
This article aims to provide a different approach to help connect and make distributed files systems like HDFS or cloud storage systems look like a local file system to data processing frameworks: the Alluxio POSIX API. To explain the approach better, we used the TensorFlow + Alluxio + AWS S3 stack as an example.
Haoyuan Li’s keynote at O’Reilly Beijing discusses open source data orchestration and the value of leveraging Alluxio with rising trends driving the need for a new architecture. Four big trends driving this need: Separation of compute & storage, hybrid-multi cloud environments, rise of object store and self-service data across the enterprise.
Haoyuan Li presents at Beijing Meetup on open source data orchestration and the value of leveraging Alluxio with rising trends driving the need for a new architecture. Four big trends driving this need: Separation of compute & storage, hybrid-multi cloud environments, rise of object store and self-service data across the enterprise.
Twitter SF is hosting 2019’s half yearly RocksDB Meetup with speakers from Twitter, Facebook and the community on July 11th.
Join us June 24 in Menlo Park for our next meetup! We’ll have 3 valuable talks, a delicious BBQ dinner and amazing summertime-themed raffle prizes! This free event is sponsored by GridGain Systems and Oracle.
In this talk, we will focus on Alluxio design, its architecture, data flow and metadata flow. We will dive into the choices in its design space and share the experiences when implementing features like data tiering, storage options and cache eviction policies. We will also share our lessons in design, implementation and operation when working to build an open source distributed storage systems with 900 contributors for 5+ years.
The goal is to make Alluxio accessible to an even wider set of users through a focus on security, new language bindings, and further increased stability. In addition, the team is working on new APIs to allow applications to access data more efficiently and manage data across different under storage systems.
In this talk, we discuss how Alluxio can be deployed and used with a Spark data processing pipeline in the cloud. We show how pipeline stages can share data with Alluxio memory for improved performance benefits, and how Alluxio can improves completion times and reduces performance variability for Spark pipelines in the cloud.