How Alluxio (formerly Tachyon) brings a 300x performance improvement to Qunar’s streaming processing

Strata+Hadoop World Singapore *

Alluxio is the first memory-speed virtual distributed storage system in the world. It unifies the interface between the various computing frameworks and under storages. Data access can be several magnitude faster because of Alluxio’s memory-centric architecture. In addition, Alluxio’s tiered storage, unified namespace, flexible file API, web UI, and command-line tools increase the usability in different application scenarios.
Qunar has been running Alluxio in production for over a year. Lei Xu explores how stream processing on Alluxio has led to a 16x performance improvement on average and 300x improvement at service peak time on workloads at Qunar.

Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics

Spark Summit East *

In this presentation, William Callaghan will focus on the challenges faced and lessons learned in building a human-in-the loop cyber threat analytics pipeline. They will discuss the topic of analytics in cybersecurity and highlight the use of technologies such as Spark Streaming/SQL, Cassandra, Kafka and Alluxio in creating an analytics architecture with missions-critical response times.

Effective Spark With Alluxio

Spark Summit East *

In this talk, we briefly introduce Alluxio, present several ways how Alluxio can help Spark be more effective, show benchmark results with Spark RDDs and DataFrames, and describe production deployments both Alluxio and Spark working together. In the meantime, we will provide live demos for some of the use cases.

How to Use Alluxio to improve Spark and Hadoop HDFS Performance of Data Access and System Reliability [Chinese]

Database Technology Conference China 2017 *

China Unicom is one of the five largest telecom operators in the world. China Unicom’s booming business in 4G and 5G networks has to serve an exploding base of hundreds of millions of smartphone users. This unprecedented growth brought enormous challenges and new requirements to the data processing infrastructure. The previous generation of its data processing system was based on IBM midrange computers, Oracle databases, and EMC storage devices. This architecture could not scale to process the amounts of data generated by the rapidly expanding number of mobile users. Even after deploying Hadoop and Greenplum database, it was still difficult to cover critical business scenarios with their varying massive data processing requirements. The complicated the architecture of its incumbent computing platform created a lot of new challenges to effectively use resources.

Alluxio (Formerly Tachyon): The Journey thus Far and the Road Ahead

Vault Linux Storage and Filesystems Conference *

Haoyuan Li explores Alluxio’s goal of making its product accessible to an even wider set of users, through a focus on security, new language bindings, and further increased stability. Haoyuan also covers some new APIs Alluxio is working on to allow applications to access data more efficiently and manage data across different under storage systems.

Spark Pipelines in the Cloud with Alluxio

Big Data Day LA *

In this talk, we discuss how Alluxio can be deployed and used with a Spark data processing pipeline in the cloud. We show how pipeline stages can share data with Alluxio memory for improved performance benefits, and how Alluxio can improves completion times and reduces performance variability for Spark pipelines in the cloud.

Using Alluxio (formerly Tachyon) to Speed Up Big Data Analytics [Chinese]

Strata Data Conference Beijing *

An overview of Alluxio basics, demonstrating how Alluxio works and how to use this system to enable distributed computation engines (like Spark or MapReduce) to share data at memory speed. Using hands-on exercises, Yupeng and Rong walk you through deploying and running Alluxio, mounting external storage systems (like S3) into Alluxio’s namespace, interacting Alluxio with built-in commands and WebUI, and building simple big data applications using common computation frameworks (e.g., Apache Spark and Hadoop MapReduce) to read from and write to Alluxio.