alluxio engineering Archives | Page 6 of 9

Testing Distributed Systems in the Big Data Ecosystem at 1000+ node Scale

January 20, 2019

Testing distributed systems at scale is typically a costly yet necessary process. At Alluxio we take testing very seriously as organizations across the world rely on our technology, therefore, a problem we want to solve is how to test at scale without breaking the bank. In this blog we are going to show how the maintainers of the Alluxio open source project build and test our system at scale cost-effectively using public cloud infrastructure. We test with the most popular frameworks, such as Spark and Hive, and pervasive storage systems, such as HDFS and S3. Using Amazon AWS EC2, we are able to test 1000+ worker clusters, at a cost of about $16 per hour.

Tags: alluxio engineering, aws s3, distributed systems, scale

Testing Distributed Systems at 1000+ node Scale for the Cost of a Large Pizza, and yes, on AWS!

January 17, 2019 By Zac Blanco

Alluxio Overview: Unify Data at Memory Speed

September 14, 2018 by Haoyuan Li & Bin Fan

Alluxio is an open source software solution that connects analytics applications to heterogeneous data sources through a data orchestration layer that sits between compute and storage.

Tags: alluxio engineering, big data, compute storage separation, data, data engineering, data orchestration, overview, storage, unified namespace

Alluxio in MOMO, JD.com, TalkingData, and Vipshop [Chinese]

August 24, 2018

Learn more about use cases with Alluxio leveraged in MOMO, JD.com, and TalkingData.

Tags: alluxio engineering, analytics, caching, cloud object storage, cloud storage, compute, compute storage separation, data, on-prem object storage, performance, storage

The Architecture of Decoupling Compute and Storage with Alluxio

December 15, 2017 by Calvin Jia & Haoyuan Li

Strata Singapore 2017 – Hear about how to decouple compute and storage with Alluxio, exploring the decision factors and considerations, along with production best practices and solutions.

Tags: alluxio engineering, compute storage separation, locality, performance

Alluxio at Spark Summit EU 2017

October 26, 2017 by Gene Pang

We briefly introduce Alluxio and present different ways Alluxio can help Spark jobs, along with best practices. We also discuss how Alluxio can be deployed and used with a Spark data processing pipeline in the cloud.

Tags: alluxio engineering, apache spark, architecture, aws s3, cloud, cloud storage, conference, developer tips, hybrid cloud, machine learning, rdd

Accelerating Spark Workloads in a Mesos Environment

October 26, 2017 by Gene Pang

MesosCon Europe 2017 – Gene Pang discusses the architecture of Mesos, Spark and Alluxio to achieve an optimal architecture for enterprises.

Tags: alluxio engineering, apache spark, architecture, aws s3, ceph, compute, conference, data, data engineering, Google Cloud Storage, hdfs, spark, storage, unified namespace

Best Practices for Using Alluxio with Apache Spark

June 6, 2017

Spark Summit SF 2017 – We briefly introduce Alluxio and present different ways Alluxio can help Spark jobs, along with best practices. We also discuss how Alluxio can be deployed and used with a Spark data processing pipeline in the cloud.

Tags: alluxio engineering, apache spark, aws, aws s3, cloud, cloud storage, conference, machine learning, spark

Tag: alluxio engineering