While looking for ways to streamline our data pipeline, we learned about Alluxio, an open source, memory speed, virtual distributed file system. We deployed Alluxio as the shared data layer for all of the intermediate stages in the data pipeline. By reading and writing data in Alluxio, the data can be read concurrently and stay in memory for the next stage of the pipeline. This increased the performance by speeding up the entire pipeline, and increased overall throughput of the pipeline allowing us to provide interactive response to our app users.
Category: Case Studies
Tencent is one of the largest technology companies in the world and a leader in multiple sectors such as social networking, gaming, e-commerce, mobile and web portal. Tencent News, one of Tencent’s many offerings, strives to create a rich, timely news application to provide users with an efficient, high-quality reading experience. To provide the best experience to more than 100 million monthly active users of Tencent News, we leverage Alluxio with Apache Spark to create a scalable, robust, and performant architecture.
Alluxio clusters act as a data access accelerator for remote data in connected storage systems. Temporarily storing data in memory, or other media near compute, accelerates access and provides local performance from remote storage. This capability is even more critical with the movement of compute applications to the cloud and data being located in object stores separate from compute. Caching is transparent to users, using read/write buffering to maintain continuity with persistent storage. Intelligent cache management utilizes configurable policies for efficient data placement and supports tiered storage for both memory and disk (SSD/HDD).
Lenovo is an Alluxio customer with a common problem and use case in the world of data analytics. They have petabytes of data in multiple data centers in different geographic locations. Analyzing it requires an ETL process to get all of the data in the right place. This is both slow, because data has to be transferred across the network, and costly because multiple copies of the data need to be stored. Freshness and quality of the data can also suffer as the data is also potentially out of date and incomplete because regulatory issues prevent certain data from being transferred.
OLAP (on-line analytical processing) technology has been widely adopted by enterprises since last century; Enterprises rely on OLAP to analyze their huge amount of data, generate reporting and so to help business people making decisions. Today in the era of big data, OLAP becomes more important and challenging than ever before; and cloud computing makes this further true. This article introduces how Kyligence, a cutting-edge big data intelligence company, leverages Alluxio to boost their performance in the cloud.
Deep learning algorithms have traditionally been used in specific applications, most notably, computer vision, machine translation, text mining, and fraud detection. Deep learning truly shines when the model is big and trained on large-scale datasets. Meanwhile, distributed computing platforms like Spark are designed to handle big data and have been used extensively. Therefore, by having deep learning available on Spark, the application of deep learning is much broader, and now businesses can fully take advantage of deep learning capabilities using their existing Spark infrastructure.