Alluxio (formerly Tachyon): The journey thus far and the road ahead

Strata+Hadoop World New York *

The goal is to make Alluxio accessible to an even wider set of users through a focus on security, new language bindings, and further increased stability. In addition, the team is working on new APIs to allow applications to access data more efficiently and manage data across different under storage systems.

How Alluxio (formerly Tachyon) brings a 300x performance improvement to Qunar’s streaming processing

Strata+Hadoop World Singapore *

Alluxio is the first memory-speed virtual distributed storage system in the world. It unifies the interface between the various computing frameworks and under storages. Data access can be several magnitude faster because of Alluxio’s memory-centric architecture. In addition, Alluxio’s tiered storage, unified namespace, flexible file API, web UI, and command-line tools increase the usability in different application scenarios.
Qunar has been running Alluxio in production for over a year. Lei Xu explores how stream processing on Alluxio has led to a 16x performance improvement on average and 300x improvement at service peak time on workloads at Qunar.

Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics

Spark Summit East *

In this presentation, William Callaghan will focus on the challenges faced and lessons learned in building a human-in-the loop cyber threat analytics pipeline. They will discuss the topic of analytics in cybersecurity and highlight the use of technologies such as Spark Streaming/SQL, Cassandra, Kafka and Alluxio in creating an analytics architecture with missions-critical response times.

Effective Spark With Alluxio

Spark Summit East *

In this talk, we briefly introduce Alluxio, present several ways how Alluxio can help Spark be more effective, show benchmark results with Spark RDDs and DataFrames, and describe production deployments both Alluxio and Spark working together. In the meantime, we will provide live demos for some of the use cases.

Spark Pipelines in the Cloud with Alluxio

Big Data Day LA *

In this talk, we discuss how Alluxio can be deployed and used with a Spark data processing pipeline in the cloud. We show how pipeline stages can share data with Alluxio memory for improved performance benefits, and how Alluxio can improves completion times and reduces performance variability for Spark pipelines in the cloud.

Using Alluxio (formerly Tachyon) to Speed Up Big Data Analytics [Chinese]

Strata Data Conference Beijing *

An overview of Alluxio basics, demonstrating how Alluxio works and how to use this system to enable distributed computation engines (like Spark or MapReduce) to share data at memory speed. Using hands-on exercises, Yupeng and Rong walk you through deploying and running Alluxio, mounting external storage systems (like S3) into Alluxio’s namespace, interacting Alluxio with built-in commands and WebUI, and building simple big data applications using common computation frameworks (e.g., Apache Spark and Hadoop MapReduce) to read from and write to Alluxio.

Best Practices for Using Alluxio with Apache Spark

Spark Summit San Francisco 2017 *

Alluxio, formerly Tachyon, is a memory speed virtual distributed storage system and leverages memory for storing data and accelerating access to data in different storage systems. Many organizations and deployments use Alluxio with Apache Spark, and some of them scale out to over PB’s of data. Alluxio can enable Spark to be even more effective, in both on-premise deployments and public cloud deployments. Alluxio bridges Spark applications with various storage systems and further accelerates data intensive applications. In this talk, we briefly introduce Alluxio, and present different ways how Alluxio can help Spark jobs. We discuss best practices of using Alluxio with Spark, including RDDs and DataFrames, as well as on-premise deployments and public cloud deployments.