In this issue, the Drip Technology Salon and the Alluxio community invited the core engineers of Didi Travel, Alluxio, Kyligence, JD.com, and Tencent to revolve around Alluxio’s position and design philosophy in the big data ecosystem, architectural features, latest developments, and well-known The company’s production-level environmental application exploration and practice, as well as the experience in the use of the process and other topics, and in-depth participants to share.
Tag: <span>compute storage separation</span>
Using Alluxio, a memory speed virtual distributed storage system, deployed on Mesos enables connecting any compute framework, such as Apache Spark, to storage systems via a unified namespace. Alluxio enables applications to interact with any data at memory speed. Alluxio can eliminate the pains of ETL and data duplication, and enable new workloads across all data. Gene will discuss the architecture of Mesos, Spark and Alluxio to achieve an optimal architecture for enterprises.
Many organizations and deployments use Alluxio with Apache Spark, and some of them scale out to over PB’s of data. Alluxio can enable Spark to be even more effective, in both on-premise deployments and public cloud deployments. Alluxio bridges Spark applications with various storage systems and further accelerates data intensive applications. In this talk, we briefly introduce Alluxio, and present different ways how Alluxio can help Spark jobs. We discuss best practices of using Alluxio with Spark, including RDDs and DataFrames, as well as on-premise deployments and public cloud deployments.
In this talk, we discuss how Alluxio can be deployed and used with a Spark data processing pipeline in the cloud. We show how pipeline stages can share data with Alluxio memory for improved performance benefits, and how Alluxio can improves completion times and reduces performance variability for Spark pipelines in the cloud.
The rise of robotics applications demands new cloud architectures that deliver high throughput and low latency. Bin Fan and Shaoshan Liu explain how PerceptIn designed and implemented a cloud architecture to support video streaming and online object recognition tasks and demonstrate how Alluxio supports these emerging cloud architectures.
Alluxio has run in JD.com’s production environment on 100 nodes for six months. Mao Baolong, Yiran Wu, and Yupeng Fu explain how JD.com uses Alluxio to provide support for ad hoc and real-time stream computing, using Alluxio-compatible HDFS URLs and Alluxio as a pluggable optimization component. To give just one example, one framework, JDPresto, has seen a 10x performance improvement on average. This work has also extended Alluxio and enhanced the syncing between Alluxio and HDFS for consistency.
With the exponentially-growing deluge of data today, data lakes are pooling everywhere. So, how can you harness them for critical insights and is there an easy way to tap into the multitude of different storage systems that they”re stored in? Enter Alluxio, an agnostic and fast storage abstraction, which, when paired with deep learning and GPU-accelerated analytics yields a quick and easy way to harness the data. Join NVIDIA”s Applied Solutions Engineering (ASE) team as they walk through how to use Alluxio for fun and profit.
Speeding Up Machine Learning in the Cloud with Alluxio
The future is the era of data, and the abstraction of efficient management, storage, and access to data is undoubtedly the cornerstone of this era. Open source distributed virtual data system Alluxio is dedicated to providing simple and efficient data abstraction, convenient data sharing and high-speed I/O for big data, machine learning, and artificial intelligence, while keeping applications and data persistent and providing rich Storage system selection. After several years of development, Alluxio was developed from a prototype of a research project involving only a few Ph.D. students and researchers in the AMPLab at the University of California, Berkeley, to more than 800 code contributors (Alluxio 1.8 release data), and deployed in Tencent. Baidu, JD, Two-Sigma, Barclays Bank and other hundreds of Chinese and foreign industry leaders in the production environment, become an important part of the data platform and data infrastructure.