aws s3 Archives | Page 2 of 9

Alluxio – Data Orchestration for Analytics and AI in the Cloud

October 9, 2019

In this talk, we present: trends and challenges in the data ecosystem in cloud era; Data engineering in the cloud with data orchestration; Use cases of using tech stacks (Presto or Tensorflow) with Alluxio on S3.

Tags: aws s3, big data, cloud, data orchestration, hdfs, meetup, presto, spark, storage, tensorflow

Getting Started with EMR Hive on Alluxio in 10 Minutes

October 8, 2019 By Bin Fan

This tutorial describes steps to set up an EMR cluster with Alluxio as a distributed caching layer for Hive, and run sample queries to access data in S3 through Alluxio.

From limited Hadoop compute capacity to increased data scientist efficiency

Alluxio Tech Talk * October 16, 2019

This tech talk will share approaches to burst data to the cloud along with
how Alluxio can enable “zero-copy” bursting of Spark workloads to cloud data services like EMR and Dataproc. Learn how DBS bank uses Alluxio to solve for limited on-prem compute capacity.

Community Office Hour: Accelerating Hive with Alluxio on S3

October 3, 2019

Learn more about Bazaarvoice’s use case leveraging Apache Spark, Hive, and Alluxio on S3. Along with how to set up Hive with Alluxio so that Hive jobs can seamlessly read from/write to S3.

Tags: alluxio engineering, aws s3, compute storage separation, hdfs, hive, office hour, spark

AWS S3 + Alluxio + Presto = ❤️ The Ryte Use Case

Alluxio Open Source Online Meetup * October 9, 2019

In this presentation, Ryte’s Chapter lead engineer, Danny Linden, shows why & how we solve some challenging technical issues, improve the speed, and reduce costs of our AWS EMR Hadoop & Presto -Backend with Alluxio to an awesome level!

Online Meetup: Cybersecurity and fraud detection at ING Bank using Presto & Alluxio on S3

September 27, 2019

In this online presentation, we present how ING is leveraging Presto (interactive query), Alluxio (data orchestration & acceleration), S3 (massive storage), and DC/OS (container orchestration) to build and operate our modern Security Analytics & Machine Learning platform. We will share the challenges we encountered and how we solved them.

Tags: aws s3, machine learning, meetup, presto, security

Effective Analytical Pipelines on AWS Using EMR, Alluxio, and S3

September 27, 2019 By Yunling Cai

This article describes my lessons from a previous project which moved a data pipeline originally running on a Hadoop cluster managed by my team, to AWS using EMR and S3. The goal was to leverage the elasticity of EMR to offload the operational work, as well as make S3 a data lake where different teams can easily share data across projects.

Tech Talk: Accelerating Analytics with EMR on your S3 Data Lake

September 12, 2019

This tech talk gives shows how to set up EMR Spark and Hive with Alluxio to seamlessly read/write to your S3 data lake, along with performance benefits.

Tags: aws s3, emr, spark, tech talk

Accelerating Hive with Alluxio on S3

Alluxio Community Office Hour * October 1, 2019

Hear about Bazaarvoice’s use case leveraging Apache Spark, Hive, and Alluxio on S3. And learn how to set up Hive with Alluxio so that Hive jobs can seamlessly read/write to S3.

Tag: aws s3