performance Archives | Page 3 of 13

Accelerating Queries on Cloud Data Lakes

ITPro Today Webinar * August 20, 2020

Join us for this webinar where Alex Ma of Alluxio, an open source data orchestration platform, will discuss how a data orchestration approach offers a solution for connecting traditional on-prem data centers with the cloud, data centers with other data centers, and clouds with other clouds. With Alluxio’s “zero-copy” burst solution, companies can bridge remote data centers with computing frameworks in other locations, enabling them to offload compute and leverage the flexibility, scalability, and power of the cloud for their remote data.

Bursting Spark or Presto Jobs to AWS using Alluxio

June 23, 2020

In this office hour, we demonstrate how a “zero-copy burst” solution helps to speed up Spark and Presto queries in the public cloud while eliminating the process of manually copying and synchronizing data from the on-premise data lake to cloud storage. This approach allows compute frameworks to decouple from on-premise data sources and scale efficiently by leveraging Alluxio and public cloud resources such as AWS.

Tags: aws, cloud storage, compute, hdfs, hybrid cloud, office hour, performance, presto, spark, zero copy bursting

Bursting Spark or Presto Jobs to AWS using Alluxio

Community Online Office Hour * June 23, 2020

Efficient Model Training in the Cloud with Kubernetes, TensorFlow, and Alluxio

May 22, 2020 By Rong Gu (Nanjing University) and Yang Che (Alibaba)

A collaboration of Alibaba, Alluxio, and Nanjing University in tackling the problems of Deep Learning model training in the cloud. Our goal was to reduce the cost and complexity of data access for Deep Learning training in a hybrid environment, which resulted in over 40% reduction in training time and cost.

Accelerate and Scale Big Data Analytics with Alluxio and Intel® Optane™ Persistent Memory

May 8, 2020

International Data Corporation (IDC) reported that the global datasphere will grow from 33 zettabytes in 2018 to 175 zettabytes by 20251. This trend becomes more and more complicated with the variety and velocity of data growth, and it continuously changes the ways data is collected, stored, processed, and analyzed. New analytics solutions, including machine learning, deep learning, and artificial intelligence (AI), and new architectures and tools are being developed to extract and deliver value from the huge datasphere.

Tags: analytics, big data, hybrid cloud, intel, open source, performance, persistent memory

Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration Between Presto & Alluxio

May 7, 2020

For many latency-sensitive SQL workloads, Presto is often bound by retrieving distant data. In this talk, Rohit Jain, James Sun from Facebook and Bin Fan from Alluxio will introduce their teams’ collaboration on adding a local on-SSD Alluxio cache inside Presto workers to improve unsatisfied Presto latency.

Tags: local cache, meetup, performance, presto, sql workloads

Build a hybrid data lake and burst processing to Google Cloud Dataproc with Alluxio

Alluxio Tech Talk * May 28, 2020

Join us for this tech talk where we will show you how Alluxio can help burst your private computing environment to Google Cloud, minimizing costs and I/O overhead. Alluxio coupled with Google’s open source data and analytics processing engine, Dataproc, enables zero-copy burst for faster query performance in the cloud so you can take advantage of resources that are not local to your data, without the need for managing the copying or syncing of that data.

Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration Between Presto & Alluxio

Alluxio Global Online Meetup * May 7, 2020

Optimizing Query Performance by Decoupling Presto and Hive Data Warehouse

March 24, 2020

Ideally, Presto would access data independently from how the data was originally stored or managed. Alluxio, as a data orchestration layer provides the physical data independence, for Presto to interact with the data more efficiently. In addition to caching for IO acceleration, Alluxio also provides a catalog service to abstract the metadata in the Hive Metastore, and transformations to expose the data in compute-optimized way. In this talk, we describe some of the challenges of using Presto with Hive, and introduce Alluxio data orchestration for solving those challenges.

Tags: alluxio engineering, catalog service, data orchestration, hive, office hour, performance, presto, structured data services

Tag: performance