data orchestration summit Archives | Page 2 of 5

Unified Data Access with Gimel

December 13, 2020

At PayPal & any other data driven enterprise – data users & applications work with a variety of data sources (RDBMS, NoSQL, Messaging, Documents, Big Data, Time Series Databases), compute engines (Spark, Flink, Beam, Hive), languages (Scala, Python, SQL) and execution models (stream, batch, interactive) to process petabytes of data. Due to this complex matrix of technologies and thousands of datasets, engineers spend considerable time learning about different data sources, formats, programming models, APIs, optimizations, etc. which impacts time-to-market (TTM).

Tags: data catalog, data orchestration, data orchestration summit, gimel, paypal

Accelerating Data Computation on Ceph Objects using Alluxio

December 13, 2020

In this talk, we will present how using Alluxio computation and storage ecosystems can better interact benefiting of the “bringing the data close to the code” approach. Moving away from the complete disaggregation of computation and storage, data locality can enhance the computation performance. During this talk, we will present our observations and testing results that will show important enhancements in accelerating Spark Data Analytics on Ceph Objects Storage using Alluxio.

Tags: ceph, compute, data orchestration, data orchestration summit, spark, storage

The hidden engineering behind machine learning products at Helixa

December 13, 2020

In this talk, we will share some common pitfalls, lessons learned, and engineering practices, faced while building customer-facing enterprise ML products. In particular, we will focus on the engineering that delivers real-time audience insights everyday to thousands of marketers via the Helixa’s market research platform.

Tags: data orchestration, data orchestration summit, helixa

Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid

December 13, 2020

Unisound focuses on Artificial Intelligence services for the Internet of Things. It is an artificial intelligence company with completely independent intellectual property rights and the world’s top intelligent voice technology. Atlas is the Deep Learning platform within Unisound AI Labs, which provides deep learning pipeline support for hundreds of algorithm scientists. This talk shares three real business training scenarios that leverage Alluxio’s distributed caching capabilities and Fluid’s cloud native capabilities, and achieve significant training acceleration and solve platform IO bottlenecks. We hope that the practice of Alluxio & Fluid on Atlas platform will bring benefits to more companies and engineers.

Tags: atlas, data orchestration, data orchestration summit, deep learning, fluid

Fluid: When Alluxio Meets Kubernetes

December 13, 2020

Nowadays, cloud native environments have attracted lots of data-intensive applications deployed and ran on them, due to the efficient-to-deploy and easy-to-maintain advantages provided by cloud native platforms and frameworks such as Docker, Kubernetes. However, cloud native frameworks does not provide the data abstraction support to the applications natively. Therefore, we build Fluid project, which co-orchestrate data and containers together. We use Alluxio as the cache runtime inside Fluid to warm up hot data. In this report, we will introduce the design and effects of the Fluid project.

Tags: data orchestration, data orchestration summit, fluid, kubernetes

Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio

December 13, 2020

Today, many people run deep learning applications with training data from separate storage such as object storage or remote data centers. This presentation will demo the Intel Analytics Zoo + Alluxio stack, an architecture that enables high performance while keeping cost and resource efficiency balanced without network being I/O bottlenecked.

Tags: analytics zoo, data orchestration, data orchestration summit, hybrid cloud, intel

Hybrid Data Lake on Google Cloud with Alluxio and Dataproc

December 13, 2020

Dataproc is Google’s managed Hadoop and Spark platform. In this talk, we will showcase how to swiftly build a hybrid cloud data platform with Alluxio and Presto and migrate data seamlessly.

Tags: data orchestration, data orchestration summit, google dataproc, hybrid data lake

Tag: data orchestration summit