spark Archives | Page 10 of 12

Alluxio 2.0 Deep Dive & A Case of Real-time Processing with Spark

Bay Area Meetup * March 12, 2019

We are excited to present Alluxio 2.0 to our community. The goal of Alluxio 2.0 was to significantly enhance data accessibility with improved APIs, expand use cases supported to include active workloads as well as better metadata management and availability to support hyperscale deployments. Alluxio 2.0 Preview Release is the first major milestone on this path to Alluxio 2.0 and includes many new features.

Two Sigma Open Source Meetup

New York Meetup * March 25, 2019

TSOS meetups focus on the open source projects that Two Sigma cares most about, from projects we generated in-house then open sourced to large external open source projects that we depend on to do our work. This time, Wenbo Zhao (Two Sigma) and Bin Fan (Alluxio) will be presenting on how Two Sigma uses Alluxio to make data-intensive compute independent of the storage beneath.

Unify Data Analytics: Any Stack Any Cloud | Webinar | Big Data Demystified

Alluxio Tech Talk * March 19, 2019

This webinar reviews: The observation and analysis of trends of separation of Storage and Compute in Big Data ecosystem; Why and how to build a new data access layer between compute and storage in this data stack; Alluxio open source: history, overview, design, and architecture; Production Use case with Spark, Presto, Tensorflow and etc; A demo of running Presto on Alluxio on S3

Efficient & Secure Big Data Analytics: Perspectives from Uber, Alibaba, & Alluxio

Seattle Meetup * March 17, 2019

Over the past two decades, the Big Data stack has reshaped and evolved quickly with numerous innovations driven by the rise of many different open source projects and communities. In this meetup, speakers from Uber, Alibaba, and Alluxio will share best practices for addressing the challenges and opportunities in the developing data architectures using new and emerging open source building blocks. Topics include data format (ORC) optimization, storage security (HDFS), data format (Parquet) layers, and unified data access (Alluxio) layers.

AVA – Qiniu AI Lab, CTrip, and Sogou Use Cases [Chinese]

October 1, 2018

Learn more about the practice of Alluxio in AVA deep learning platform, Ctrip big data platform, and Sogou.

Tags: big data, case study, hive, machine learning, spark

Accelerating Spark Workloads in a Mesos Environment

October 26, 2017 by Gene Pang

MesosCon Europe 2017 – Gene Pang discusses the architecture of Mesos, Spark and Alluxio to achieve an optimal architecture for enterprises.

Tags: alluxio engineering, apache spark, architecture, aws s3, ceph, compute, conference, data, data engineering, Google Cloud Storage, hdfs, spark, storage, unified namespace

Best Practices for Using Alluxio with Apache Spark

June 6, 2017

Spark Summit SF 2017 – We briefly introduce Alluxio and present different ways Alluxio can help Spark jobs, along with best practices. We also discuss how Alluxio can be deployed and used with a Spark data processing pipeline in the cloud.

Tags: alluxio engineering, apache spark, aws, aws s3, cloud, cloud storage, conference, machine learning, spark

Alluxio (Formerly Tachyon): Unify Data At Memory Speed

April 2, 2017 by Gene Pang

Global Big Data Conference 2017 – In the past year, the Alluxio project experienced significant improvement in performance and scalability and was extended with key new features including tiered storage, transparent naming, and unified namespace

Tags: alluxio engineering, apache spark, big data, compute, conference, data, data engineering, performance, scale, spark, storage, tiered storage

Tag: spark