hybrid cloud Archives | Page 7 of 9

“Zero-Copy” Hybrid Bursting with no App Changes

June 28, 2019

This whitepaper details how to leverage any public cloud (AWS, Google Cloud Platform, or Microsoft Azure) to scale analytics workloads directly on on-prem data without copying and synchronizing the data into the cloud. We will show an example of what it might look like to run on-demand Starburst Presto, Spark, and Hive with Alluxio in the public cloud using on-prem HDFS.

The paper also includes a real world case study on a leading hedge fund based in New York City, who deployed large clusters of Google Compute Engine VMs with Spark and Alluxio using on-prem HDFS as the underlying storage tier.

Tags: apache hive, apache spark, aws, case study, hybrid cloud, presto

O’Reilly AI Conference Keynote: Data Orchestration for AI, Big Data, and Cloud

June 28, 2019

Haoyuan Li’s keynote at O’Reilly Beijing discusses open source data orchestration and the value of leveraging Alluxio with rising trends driving the need for a new architecture. Four big trends driving this need: Separation of compute & storage, hybrid-multi cloud environments, rise of object store and self-service data across the enterprise.

Tags: big data, cloud, cloud object storage, cloud storage, compute storage separation, conference, data, data orchestration, hybrid cloud, multi cloud, on-prem object storage, storage

Tech Talk: Accelerate Spark Workloads on S3

June 28, 2019

While running analytics workloads using EMR Spark on S3 is a common deployment today, many organizations face issues in performance and consistency. EMR can be bottlenecked when reading large amounts of data from S3, and sharing data across multiple stages of a pipeline can be difficult as S3 is eventually consistent for read-your-own-write scenarios.

A simple solution is to run Spark on Alluxio as a distributed cache for S3. Alluxio stores data in memory close to Spark, providing high performance, in addition to providing data accessibility and abstraction for deployments in both public and hybrid clouds.

Tags: aws, cloud, compute storage separation, data, data orchestration, emr, hybrid cloud, on-prem object storage, spark, tech talk

If you have a hybrid cloud architecture, using either a VPN or a dedicated high-speed circuit, does the network speed become a bottleneck in the hybrid data use case?

While adding a higher-bandwidth dedicated circuit will help, Alluxio data orchestration addresses the hybrid problem by making the data local to the compute nodes.

Community Office Hour: Running Spark & Alluxio in Kubernetes

June 25, 2019 by Bin Fan & Adit Madan

The data orchestration layer bridging the gap between data locality with improved performance and data accessibility for analytics workloads in Kubernetes, and enables portability across storage providers.
An overview of Alluxio and the cloud use case with Spark in Kubernetes. Learn how to set up Alluxio and Spark to run in Kubernetes.

Tags: analytics, apache spark, compute, compute storage separation, data, data orchestration, hybrid cloud, kubernetes, locality, multi cloud, office hour, spark, storage

Alluxio at Beijing Meetup

June 25, 2019

Haoyuan Li presents at Beijing Meetup on open source data orchestration and the value of leveraging Alluxio with rising trends driving the need for a new architecture. Four big trends driving this need: Separation of compute & storage, hybrid-multi cloud environments, rise of object store and self-service data across the enterprise.

Tags: big data, cloud, cloud storage, compute storage separation, data, data orchestration, hybrid cloud, meetup, multi cloud, storage

Hybrid Environments for Data Analytics is a Possibility

June 21, 2019 By Madan Kumar and Adit Madan

As the data ecosystem becomes massively complex and more and more disaggregated, data analysts and end users have trouble adapting and working with hybrid environments. The proliferation of compute applications along with storage mediums leads to a hybrid model that we are just not accustomed to.
With this disaggregated system data engineers now come across a multitude of problems that they must overcome in order to get meaningful insights.

RocksDB Meetup at Twitter

Bay Area Meetup * July 11, 2019

Twitter SF is hosting 2019’s half yearly RocksDB Meetup with speakers from Twitter, Facebook and the community on July 11th.

Tag: hybrid cloud