Accelerating Spark Workloads in an Apache Mesos Environment with Alluxio

MesosCon North America 2017 *

Using Alluxio, an open-source memory speed virtual distributed storage system, deployed on Mesos enables connecting any compute framework, such as Apache Spark, to storage systems via a unified namespace. Alluxio enables applications to interact with any data at memory speed. Alluxio can eliminate the pains of ETL and data duplication, and enable new workloads across all data. Adit will discuss the architecture of Mesos, Spark and Alluxio to achieve an optimal architecture for enterprises.

Apache Kylin And Alluxio Meetup

Shanghai Meetup *

With the development of online services and clusters, the HDFS NameNode becomes a performance bottleneck of the HDFS cluster, which is not conducive to the horizontal expansion of the cluster.
The community’s Federation + viewFs solution solves the problem of horizontal scaling of HDFS, but the configuration of this solution is implemented on the client side, which is not conducive to the operation and management of large-scale clusters. Using Alluxio as a unified portal for multiple HDFS clusters, operation and maintenance management is convenient, and distributed cache capability is provided.

Accelerating Spark Workloads in a Mesos Environment with Alluxio

MesosCon Europe 2017 *

Using Alluxio, a memory speed virtual distributed storage system, deployed on Mesos enables connecting any compute framework, such as Apache Spark, to storage systems via a unified namespace. Alluxio enables applications to interact with any data at memory speed. Alluxio can eliminate the pains of ETL and data duplication, and enable new workloads across all data. Gene will discuss the architecture of Mesos, Spark and Alluxio to achieve an optimal architecture for enterprises.

Powering Robotics Clouds with Alluxio

Strata San Jose *

The rise of robotics applications demands new cloud architectures that deliver high throughput and low latency. Bin Fan and Shaoshan Liu explain how PerceptIn designed and implemented a cloud architecture to support video streaming and online object recognition tasks and demonstrate how Alluxio supports these emerging cloud architectures.

Using Alluxio (formerly Tachyon) as a fault-tolerant pluggable optimization component to compute frameworks of JD system

Strata London *

Alluxio has run in JD.com’s production environment on 100 nodes for six months. Mao Baolong, Yiran Wu, and Yupeng Fu explain how JD.com uses Alluxio to provide support for ad hoc and real-time stream computing, using Alluxio-compatible HDFS URLs and Alluxio as a pluggable optimization component. To give just one example, one framework, JDPresto, has seen a 10x performance improvement on average. This work has also extended Alluxio and enhanced the syncing between Alluxio and HDFS for consistency.

Unified Big Data Analytics – Any stack, Any Cloud

Boston Meetup *

This presentation focuses on how Alluxio helps the big data analytics stack to be cloud-native. The trending Cloud object storage systems provide more cost-effective and scalable storage solutions but also different semantics and performance implications compared to HDFS. Applications like Spark or Presto will not benefit from the node-level locality or cross-job caching when retrieving data from the cloud object storage. Deploying Alluxio to access cloud solves these problems because data will be retrieved and cached in Alluxio instead of the underlying cloud or object storage repeatedly.

Alluxio+Presto: An Architecture for Fast SQL in the Cloud

Bay Area Meetup *

Cloud object storage systems provide different semantics and performance implications compared to HDFS. Applications like Presto cannot benefit from the node-level locality or cross-job caching when reading from the cloud. Deploying Alluxio with Presto to access cloud solves these problems because data will be retrieved and cached in Alluxio instead of the underlying cloud or object storage repeatedly. Bin will present the architecture to combine Presto with Alluxio with use cases from major internet companies like JD.com and NetEase.com, and their lessons learned to operate this architecture at scale.

Two Sigma Open Source Meetup

New York Meetup *

TSOS meetups focus on the open source projects that Two Sigma cares most about, from projects we generated in-house then open sourced to large external open source projects that we depend on to do our work. This time, Wenbo Zhao (Two Sigma) and Bin Fan (Alluxio) will be presenting on how Two Sigma uses Alluxio to make data-intensive compute independent of the storage beneath.