hadoop Archives

Reducing large S3 API costs using Alluxio at Datasapiens

August 5, 2020

In this talk, we will describe how we have solved an issue with large S3 API costs incurred by Presto under several usage concurrency levels by implementing Alluxio as a data orchestration layer between S3 and Presto. Also, we will show the results of an experiment with estimating the per-query S3 API costs using the TPC-DS dataset.

Tags: data orchestration, datasapiens, hadoop, presto, s3 api

Reducing Large S3 API Costs Using Alluxio

July 30, 2020 By Juraj Pohanka (datasapiens), Koen Michiels (datasapiens) and Sam Gilbert (datasapiens)

This article described how engineers at datasapiens brought down S3 API costs by 200x by implementing Alluxio as a data orchestration layer between S3 and Presto.

Adopting Satellite Clusters with Alluxio at Vipshop to Improve Spark Jobs for Targeted Advertising by 30x

July 25, 2020 By Gang Deng (Vipshop) and Jasmine Wang

As the third largest e-commerce site in China, Vipshop processes large amounts of data collected daily to generate targeted advertisements for its consumers. In this article, Gang Deng from Vipshop describes how to meet SLAs by improving struggling Spark jobs on HDFS by up to 30x, and optimize hot data access with Alluxio to create … Continued

Tag: hadoop

Reducing large S3 API costs using Alluxio at Datasapiens

Adopting Satellite Clusters with Alluxio at Vipshop to Improve Spark Jobs for Targeted Advertising by 30x

How does Cloudera’s hybrid cloud approach work and how does it compare with Alluxio’s “zero-copy” bursting approach?

How does the WANdisco Hybrid Data Lake Solution in AWS compare to zero-copy bursting to the cloud?

What do I do if Hadoop is slow?