Learn about stream processing on Alluxio from real-world workloads at Qunar, as well as how to position Alluxio in the streaming architecture. Xueyan Li and Yupeng Fu explore how Alluxio has led to performance improvements averaging a 300x improvement at service peak time on stream processing workloads at Qunar.
Alluxio meetups, conferences, events and more
The latest Alluxio meetups, webinars, conferences and more
China Unicom is one of the five largest telecom operators in the world. China Unicom’s booming business in 4G and 5G networks has to serve an exploding base of hundreds of millions of smartphone users. This unprecedented growth brought enormous challenges and new requirements to the data processing infrastructure. The previous generation of its data processing system was based on IBM midrange computers, Oracle databases, and EMC storage devices. This architecture could not scale to process the amounts of data generated by the rapidly expanding number of mobile users. Even after deploying Hadoop and Greenplum database, it was still difficult to cover critical business scenarios with their varying massive data processing requirements. The complicated the architecture of its incumbent computing platform created a lot of new challenges to effectively use resources.
Haoyuan Li explores Alluxio’s goal of making its product accessible to an even wider set of users, through a focus on security, new language bindings, and further increased stability. Haoyuan also covers some new APIs Alluxio is working on to allow applications to access data more efficiently and manage data across different under storage systems.
Using Alluxio to Improve Spark & Hadoop HDFS System Performance and Reliability [Chinese]
In this presentation, William Callaghan will focus on the challenges faced and lessons learned in building a human-in-the loop cyber threat analytics pipeline. They will discuss the topic of analytics in cybersecurity and highlight the use of technologies such as Spark Streaming/SQL, Cassandra, Kafka and Alluxio in creating an analytics architecture with missions-critical response times.
In this talk, we briefly introduce Alluxio, present several ways how Alluxio can help Spark be more effective, show benchmark results with Spark RDDs and DataFrames, and describe production deployments both Alluxio and Spark working together. In the meantime, we will provide live demos for some of the use cases.
Enterprises typically store large amounts of data in existing storage systems, which are often separate from big data analytics systems. Therefore, importing petabytes of data into a big data analytics system takes a long time with large overheads and high costs. Even worse, transferring large amounts of data results in data silos and unnecessary duplication, which creates serious data management problems.
Alluxio is the first memory-speed virtual distributed storage system in the world. It unifies the interface between the various computing frameworks and under storages. Data access can be several magnitude faster because of Alluxio’s memory-centric architecture. In addition, Alluxio’s tiered storage, unified namespace, flexible file API, web UI, and command-line tools increase the usability in different application scenarios.
Qunar has been running Alluxio in production for over a year. Lei Xu explores how stream processing on Alluxio has led to a 16x performance improvement on average and 300x improvement at service peak time on workloads at Qunar.
The goal is to make Alluxio accessible to an even wider set of users through a focus on security, new language bindings, and further increased stability. In addition, the team is working on new APIs to allow applications to access data more efficiently and manage data across different under storage systems.