The practice of Presto & Alluxio in E-commerce big data platform is one of the largest e-commerce corporations. In big data platform of, there are tens of thousands of nodes and tens of petabytes off-line data which require millions of spark and MapReduce jobs to process everyday. As the main query engine, thousands of machines work as Presto nodes and Presto plays an import role in the field of In-place analysis and BI tools. Meanwhile, Alluxio is deployed to improve the performance of Presto. The practice of Presto & Alluxio in benefits a lot of engineers and analysts.

Tags: , , ,

Unified Data Access with Gimel

At PayPal & any other data driven enterprise – data users & applications work with a variety of data sources (RDBMS, NoSQL, Messaging, Documents, Big Data, Time Series Databases), compute engines (Spark, Flink, Beam, Hive), languages (Scala, Python, SQL) and execution models (stream, batch, interactive) to process petabytes of data. Due to this complex matrix of technologies and thousands of datasets, engineers spend considerable time learning about different data sources, formats, programming models, APIs, optimizations, etc. which impacts time-to-market (TTM).

Tags: , , , ,

Accelerating Data Computation on Ceph Objects using Alluxio

In this talk, we will present how using Alluxio computation and storage ecosystems can better interact benefiting of the “bringing the data close to the code” approach. Moving away from the complete disaggregation of computation and storage, data locality can enhance the computation performance. During this talk, we will present our observations and testing results that will show important enhancements in accelerating Spark Data Analytics on Ceph Objects Storage using Alluxio.

Tags: , , , , ,

Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid

Unisound focuses on Artificial Intelligence services for the Internet of Things. It is an artificial intelligence company with completely independent intellectual property rights and the world’s top intelligent voice technology. Atlas is the Deep Learning platform within Unisound AI Labs, which provides deep learning pipeline support for hundreds of algorithm scientists. This talk shares three real business training scenarios that leverage Alluxio’s distributed caching capabilities and Fluid’s cloud native capabilities, and achieve significant training acceleration and solve platform IO bottlenecks. We hope that the practice of Alluxio & Fluid on Atlas platform will bring benefits to more companies and engineers.

Tags: , , , ,

Fluid: When Alluxio Meets Kubernetes

Nowadays, cloud native environments have attracted lots of data-intensive applications deployed and ran on them, due to the efficient-to-deploy and easy-to-maintain advantages provided by cloud native platforms and frameworks such as Docker, Kubernetes. However, cloud native frameworks does not provide the data abstraction support to the applications natively. Therefore, we build Fluid project, which co-orchestrate data and containers together. We use Alluxio as the cache runtime inside Fluid to warm up hot data. In this report, we will introduce the design and effects of the Fluid project.

Tags: , , ,