aws Archives | Page 2 of 3

“Zero-Copy” Hybrid Cloud for Data Analytics – Strategy, Architecture and Benchmark Report

April 6, 2020

This whitepaper details how to leverage a public cloud, such as Amazon AWS, Google GCP, or Microsoft Azure to scale analytic workloads directly on data on-premises without copying and synchronizing the data into the cloud. We will show an example of what it might look like to run on-demand Presto and Hive with Alluxio in the public cloud using on-prem HDFS. We will also show how to set up and execute performance benchmarks in two geographically dispersed Amazon EMR clusters along with a summary of our findings.

Tags: aws, azure, data analytics, emr, gcp, hdfs, hive, hybrid cloud, presto, public cloud, zero copy

Testing Distributed System at Scale for the Cost of a Large Pizza on AWS

February 25, 2020

Building distributed systems is no small feat. Software testing is just one of many critical practices that engineers who build these systems need to utilize to ensure the quality and usability of their software. For distributed systems, scaling out testing frameworks to ensure that enterprises who run our in highly distributed environments is a complicated (and expensive task!)

Tags: aws, distributed systems, office hour, scale, testing

Enabling big data & AI workloads on the object store at DBS

October 14, 2019

Vitaliy and Dipti dive into how DBS Bank built a modern big data analytics stack, leveraging an object store as persistent storage even for data-intensive workloads, and how it uses Alluxio to orchestrate data locality and data access for Spark workloads.

Tags: aws, big data, conference, hybrid cloud bursting, object stores, unified namespace

Online Meetup: AWS S3 + Alluxio + Presto = ❤️ The Ryte Use Case

October 10, 2019

This online meetup shows why and how we solve some challenging technical issues, improve the speed, and reduce the costs of our AWS EMR Hadoop & Presto -Backend with Alluxio to an awesome level.

Tags: aws, aws s3, emr, hadoop, presto

Getting Started with EMR Hive on Alluxio in 10 Minutes

October 8, 2019 By Bin Fan

This tutorial describes steps to set up an EMR cluster with Alluxio as a distributed caching layer for Hive, and run sample queries to access data in S3 through Alluxio.

Recap: AWS Summit New York

July 22, 2019 By Amelia Wong

Alluxio is a proud sponsor and exhibitor at the AWS Summit in New York. If you weren’t able to attend, here are the highlights

Turn cloud storage or HDFS into your local file system for faster AI model training with TensorFlow

July 3, 2019 By Lu Qiu and Bin Fan

This article aims to provide a different approach to help connect and make distributed files systems like HDFS or cloud storage systems look like a local file system to data processing frameworks: the Alluxio POSIX API. To explain the approach better, we used the TensorFlow + Alluxio + AWS S3 stack as an example.

Tag: aws