data orchestration Archives | Page 16 of 16

Alluxio at Beijing Meetup

June 25, 2019

Haoyuan Li presents at Beijing Meetup on open source data orchestration and the value of leveraging Alluxio with rising trends driving the need for a new architecture. Four big trends driving this need: Separation of compute & storage, hybrid-multi cloud environments, rise of object store and self-service data across the enterprise.

Tags: big data, cloud, cloud storage, compute storage separation, data, data orchestration, hybrid cloud, meetup, multi cloud, storage

Embracing Data Silos — the journey through a fragmented data world

June 21, 2019 By Amelia Wong and Bin Fan

Over the years of working in the big data and machine learning space, we frequently hear from data engineers that the biggest obstacle to extracting value from data is being able to access the data efficiently. Data silos, isolated islands of data, are often viewed by data engineers as the key culprit or public enemy №1. There have been many attempts to do away with data silos, but those attempts themselves have resulted in yet another data silo, with data lakes being one such example. Rather than attempting to eliminate data silos, we believe the right approach is to embrace them.

Recap: Spark+AI Summit 2019

May 2, 2019 By Amelia Wong

Alluxio is a proud sponsor and exhibitor of Spark+AI Summit in San Francisco.
What’s Spark+AI Summit? It’s the world’s largest conference that is focused on Apache Spark – Alluxio’s older cousin open source project from the same lab (UC Berkeley’s AMPLab – now RISElab).

Alluxio Overview: Unify Data at Memory Speed

September 14, 2018 by Haoyuan Li & Bin Fan

Alluxio is an open source software solution that connects analytics applications to heterogeneous data sources through a data orchestration layer that sits between compute and storage.

Tags: alluxio engineering, big data, compute storage separation, data, data engineering, data orchestration, overview, storage, unified namespace

Alluxio at Strata + Hadoop World San Jose 2017

March 16, 2017 by Calvin Jia

Calvin Jia introduces Alluxio, explain how Alluxio can help Spark be more effective, show benchmark results with Spark RDDs and DataFrames, and describe production deployments with both Alluxio and Spark working together.

Tags: alluxio engineering, apache spark, aws s3, ceph, conference, data, data engineering, data orchestration, Gluster, Google Cloud Storage, hdfs, NFS, performance, scale, spark, storage

Fast Big Data Analytics and Machine Learning Using Alluxio and Spark in Baidu

March 28, 2016

Strata+Hadoop World 2016 – Baidu deployed Alluxio to accelerate its big data analytics workload. Bin Fan and Haojun Wang explain why Baidu chose Alluxio, as well as the details of how they achieved a 30x speedup with Alluxio in their production environment with hundreds of machines. Based on the success of the big data analytics engine, Baidu is currently expanding the Alluxio and Spark infrastructure to accelerate other applications, such as machine learning.

Tags: analytics, big data, compute, data, data engineering, data orchestration, machine learning, performance, storage

Tag: data orchestration