Join us at Intel Innovation, the latest digital educational conference for developers and industry insiders. You’ll hear from the experts who deliver advanced AI, 5G, edge, cloud, and client technologies with speed and real-world scale. Exclusive sessions include product launches, demos, hands-on workshops, keynotes, and a sneak peek at Intel’s road map. Secure your spot at Intel Innovation.
Alluxio meetups, conferences, events and more
The latest Alluxio meetups, webinars, conferences and more
In this talk, we briefly introduce Alluxio, present several ways how Alluxio can help Spark be more effective, show benchmark results with Spark RDDs and DataFrames, and describe production deployments both Alluxio and Spark working together. In the meantime, we will provide live demos for some of the use cases.
In this talk, we discuss how Alluxio can be deployed and used with a Spark data processing pipeline in the cloud. We show how pipeline stages can share data with Alluxio memory for improved performance benefits, and how Alluxio can improves completion times and reduces performance variability for Spark pipelines in the cloud.
An overview of Alluxio basics, demonstrating how Alluxio works and how to use this system to enable distributed computation engines (like Spark or MapReduce) to share data at memory speed. Using hands-on exercises, Yupeng and Rong walk you through deploying and running Alluxio, mounting external storage systems (like S3) into Alluxio’s namespace, interacting Alluxio with built-in commands and WebUI, and building simple big data applications using common computation frameworks (e.g., Apache Spark and Hadoop MapReduce) to read from and write to Alluxio.
Alluxio, formerly Tachyon, is a memory speed virtual distributed storage system and leverages memory for storing data and accelerating access to data in different storage systems. Many organizations and deployments use Alluxio with Apache Spark, and some of them scale out to over PB’s of data. Alluxio can enable Spark to be even more effective, in both on-premise deployments and public cloud deployments. Alluxio bridges Spark applications with various storage systems and further accelerates data intensive applications. In this talk, we briefly introduce Alluxio, and present different ways how Alluxio can help Spark jobs. We discuss best practices of using Alluxio with Spark, including RDDs and DataFrames, as well as on-premise deployments and public cloud deployments.
Learn about stream processing on Alluxio from real-world workloads at Qunar, as well as how to position Alluxio in the streaming architecture. Xueyan Li and Yupeng Fu explore how Alluxio has led to performance improvements averaging a 300x improvement at service peak time on stream processing workloads at Qunar.
China Unicom is one of the five largest telecom operators in the world. China Unicom’s booming business in 4G and 5G networks has to serve an exploding base of hundreds of millions of smartphone users. This unprecedented growth brought enormous challenges and new requirements to the data processing infrastructure. The previous generation of its data processing system was based on IBM midrange computers, Oracle databases, and EMC storage devices. This architecture could not scale to process the amounts of data generated by the rapidly expanding number of mobile users. Even after deploying Hadoop and Greenplum database, it was still difficult to cover critical business scenarios with their varying massive data processing requirements. The complicated the architecture of its incumbent computing platform created a lot of new challenges to effectively use resources.
Haoyuan Li explores Alluxio’s goal of making its product accessible to an even wider set of users, through a focus on security, new language bindings, and further increased stability. Haoyuan also covers some new APIs Alluxio is working on to allow applications to access data more efficiently and manage data across different under storage systems.
Speed is usually a key factor when analyzing large amounts of data. Alluxio enables analytics applications, such as Apache Spark, to retrieve stored data at memory speeds. DC/OS makes it easy to deploy distributed programs (such as Alluxio and Spark) and containers across large clusters.
In this talk, we will first discuss the development of the DC/OS Alluxio package, which deploys Alluxio on top of DC/OS, and then then demo the deployment a complete analytics stack, both with and without Alluxio, in order to see the benefits Alluxio provides.
Haoyuan Li and Cheng Chang explain how Alluxio makes Spark more effective in both on-premises and public cloud deployments and share production deployments of Alluxio and Spark working together. Along the way, they discuss best practices for using Alluxio with Spark, including with RDDs and DataFrames.