Best Practices for Using Alluxio with Apache Spark

Spark Summit San Francisco 2017 *

Alluxio, formerly Tachyon, is a memory speed virtual distributed storage system and leverages memory for storing data and accelerating access to data in different storage systems. Many organizations and deployments use Alluxio with Apache Spark, and some of them scale out to over PB’s of data. Alluxio can enable Spark to be even more effective, in both on-premise deployments and public cloud deployments. Alluxio bridges Spark applications with various storage systems and further accelerates data intensive applications. In this talk, we briefly introduce Alluxio, and present different ways how Alluxio can help Spark jobs. We discuss best practices of using Alluxio with Spark, including RDDs and DataFrames, as well as on-premise deployments and public cloud deployments.

How to Use Alluxio to improve Spark and Hadoop HDFS Performance of Data Access and System Reliability [Chinese]

Database Technology Conference China 2017 *

China Unicom is one of the five largest telecom operators in the world. China Unicom’s booming business in 4G and 5G networks has to serve an exploding base of hundreds of millions of smartphone users. This unprecedented growth brought enormous challenges and new requirements to the data processing infrastructure. The previous generation of its data processing system was based on IBM midrange computers, Oracle databases, and EMC storage devices. This architecture could not scale to process the amounts of data generated by the rapidly expanding number of mobile users. Even after deploying Hadoop and Greenplum database, it was still difficult to cover critical business scenarios with their varying massive data processing requirements. The complicated the architecture of its incumbent computing platform created a lot of new challenges to effectively use resources.

Best Practices for Using Alluxio with Spark

Strata Data Conference New York 2017 *

Haoyuan Li and Cheng Chang explain how Alluxio makes Spark more effective in both on-premises and public cloud deployments and share production deployments of Alluxio and Spark working together. Along the way, they discuss best practices for using Alluxio with Spark, including with RDDs and DataFrames.

Accelerating Spark Workloads in an Apache Mesos Environment with Alluxio

MesosCon North America 2017 *

Using Alluxio, an open-source memory speed virtual distributed storage system, deployed on Mesos enables connecting any compute framework, such as Apache Spark, to storage systems via a unified namespace. Alluxio enables applications to interact with any data at memory speed. Alluxio can eliminate the pains of ETL and data duplication, and enable new workloads across all data. Adit will discuss the architecture of Mesos, Spark and Alluxio to achieve an optimal architecture for enterprises.

Guardant Health: Fast, scalable, data processing with Alluxio, Mesos, and Minio

Alluxio and Mesos Joint Meetup *

Speed is usually a key factor when analyzing large amounts of data. Alluxio enables analytics applications, such as Apache Spark, to retrieve stored data at memory speeds. DC/OS makes it easy to deploy distributed programs (such as Alluxio and Spark) and containers across large clusters.
In this talk, we will first discuss the development of the DC/OS Alluxio package, which deploys Alluxio on top of DC/OS, and then then demo the deployment a complete analytics stack, both with and without Alluxio, in order to see the benefits Alluxio provides.

Apache Kylin And Alluxio Meetup

Shanghai Meetup *

With the development of online services and clusters, the HDFS NameNode becomes a performance bottleneck of the HDFS cluster, which is not conducive to the horizontal expansion of the cluster.
The community’s Federation + viewFs solution solves the problem of horizontal scaling of HDFS, but the configuration of this solution is implemented on the client side, which is not conducive to the operation and management of large-scale clusters. Using Alluxio as a unified portal for multiple HDFS clusters, operation and maintenance management is convenient, and distributed cache capability is provided.

Alluxio Exploration And Application Practice Meetup

Beijing Meetup *

In this issue, the Drip Technology Salon and the Alluxio community invited the core engineers of Didi Travel, Alluxio, Kyligence, JD.com, and Tencent to revolve around Alluxio’s position and design philosophy in the big data ecosystem, architectural features, latest developments, and well-known The company’s production-level environmental application exploration and practice, as well as the experience in the use of the process and other topics, and in-depth participants to share.

Accelerating Spark Workloads in a Mesos Environment with Alluxio

MesosCon Europe 2017 *

Using Alluxio, a memory speed virtual distributed storage system, deployed on Mesos enables connecting any compute framework, such as Apache Spark, to storage systems via a unified namespace. Alluxio enables applications to interact with any data at memory speed. Alluxio can eliminate the pains of ETL and data duplication, and enable new workloads across all data. Gene will discuss the architecture of Mesos, Spark and Alluxio to achieve an optimal architecture for enterprises.