big data Archives | Page 4 of 7

Alluxio on EMR: Fast Storage Access and Sharing for Spark Jobs

June 11, 2019 By Chengzhi Zhao

Traditionally, if you want to run a single Spark job on EMR, you might follow the steps: launching a cluster, running the job which reads data from storage layer like S3, performing transformations within RDD/Dataframe/Dataset, finally, sending the result back to S3. You end up having something like this.
If we add more Spark jobs across multiple clusters, you could have something like this.

Recap: Spark+AI Summit 2019

May 2, 2019 By Amelia Wong

Alluxio is a proud sponsor and exhibitor of Spark+AI Summit in San Francisco.
What’s Spark+AI Summit? It’s the world’s largest conference that is focused on Apache Spark – Alluxio’s older cousin open source project from the same lab (UC Berkeley’s AMPLab – now RISElab).

AVA – Qiniu AI Lab, CTrip, and Sogou Use Cases [Chinese]

October 1, 2018

Learn more about the practice of Alluxio in AVA deep learning platform, Ctrip big data platform, and Sogou.

Tags: big data, case study, hive, machine learning, spark

Alluxio Overview: Unify Data at Memory Speed

September 14, 2018 by Haoyuan Li & Bin Fan

Alluxio is an open source software solution that connects analytics applications to heterogeneous data sources through a data orchestration layer that sits between compute and storage.

Tags: alluxio engineering, big data, compute storage separation, data, data engineering, data orchestration, overview, storage, unified namespace

Using Alluxio as a Fault-Tolerant Pluggable Optimization Component to Compute Frameworks of JD System

May 21, 2018 by Baolong Mao [JD.com], Yiran Wu [JD.com], Yupeng Fu

Strata London 2018 – Learn how JD.com uses Alluxio as a pluggable optimization component to provide support for ad hoc and real-time stream computing.

Tags: big data, hdfs, performance, storage, unified namespace

Introduction To Alluxio (formerly Tachyon) and How It Brings Up To 300x Performance Improvement To Qunar’s Streaming Processing

May 19, 2017 by Yupeng Fu, Xueyan Li [Qunar]

Strata Data Conference London 2017 – Learn about stream processing on Alluxio from real-world workloads at Qunar, as well as how to position Alluxio in the streaming architecture

Tags: architecture, big data, conference, data, data engineering, distributed systems, performance, storage, tiered storage, unified namespace

Alluxio (Formerly Tachyon): Unify Data At Memory Speed

April 2, 2017 by Gene Pang

Global Big Data Conference 2017 – In the past year, the Alluxio project experienced significant improvement in performance and scalability and was extended with key new features including tiered storage, transparent naming, and unified namespace

Tags: alluxio engineering, apache spark, big data, compute, conference, data, data engineering, performance, scale, spark, storage, tiered storage

Tag: big data