cloud Archives | Page 3 of 12

Hybrid Data Lake Architecture with Presto & Spark in the cloud accessing on-prem storage

September 29, 2020

In this talk, we describe the architecture to migrate analytics workloads incrementally to any public cloud (AWS, Google Cloud Platform, or Microsoft Azure) directly on on-prem data without copying the data to cloud storage.

Tags: cloud, data analytics, data lake, hdfs, hybrid, on-prem, presto, spark, storage

Efficient Model Training in the Cloud with Kubernetes, TensorFlow, and Alluxio

May 22, 2020 By Rong Gu (Nanjing University) and Yang Che (Alibaba)

A collaboration of Alibaba, Alluxio, and Nanjing University in tackling the problems of Deep Learning model training in the cloud. Our goal was to reduce the cost and complexity of data access for Deep Learning training in a hybrid environment, which resulted in over 40% reduction in training time and cost.

Build a hybrid data lake and burst processing to Google Cloud Dataproc with Alluxio

Alluxio Tech Talk * May 28, 2020

Join us for this tech talk where we will show you how Alluxio can help burst your private computing environment to Google Cloud, minimizing costs and I/O overhead. Alluxio coupled with Google’s open source data and analytics processing engine, Dataproc, enables zero-copy burst for faster query performance in the cloud so you can take advantage of resources that are not local to your data, without the need for managing the copying or syncing of that data.

Alluxio Accelerates Deep Learning in Hybrid Cloud using Intel’s Analytics Zoo open source platform powered by oneAPI

April 28, 2020

This article describes how Alluxio accelerates the training of deep learning models in a hybrid cloud environment with Intel’s Analytics Zoo open source platform, powered by oneAPI. Details on the new architecture and workflow, as well as Alluxio’s performance benefits and benchmarks results will be discussed.

Tags: analytics, analytics zoo, benchmark, big data, cloud, deep learning applications, hybrid cloud, intel, spark

Burst Presto & Spark workloads to AWS EMR with no data copies

April 28, 2020

In this talk, we will show you how to leverage any public cloud (AWS, Google Cloud Platform, or Microsoft Azure) to scale analytics workloads directly on on-prem data without copying and synchronizing the data into the cloud.

Tags: analytic workloads, cloud, hdfs, hybrid cloud, office hour, presto, public cloud, spark

Bursting Apache Spark Workloads to the Cloud on Remote Data

Community Online Office Hour * March 10, 2020

Accessing data to run analytic workloads in Spark across data centers and/or clouds can be challenging. Additionally, network I/O can bottleneck Spark jobs that need to read a large amount of data. A common solution is to deploy an HDFS cluster closer to Spark as a caching layer and manually copy the input data to HDFS first, purging it afterward. But this ETL process can be both time-consuming and also error-prone.

Tag: cloud