Reducing large S3 API costs using Alluxio at Datasapiens

Tags: , , , ,

ALLUXIO GLOBAL ONLINE MEETUP

Datasapiens is an international data-analytics startup based in Prague. We help our clients to uncover the value of their data and open up new revenue streams for them. We provide an end-to-end service that manages the data pipeline and automates the process of generating data insights.

In this talk, we will describe how we have solved an issue with large S3 API costs incurred by Presto under several usage concurrency levels by implementing Alluxio as a data orchestration layer between S3 and Presto. Also, we will show the results of an experiment with estimating the per-query S3 API costs using the TPC-DS dataset.

This talk will focus on:

  • The Hadoop ecosystem at Datasapiens
  • Drastic increase of S3 API costs during performance tests with Presto
  • S3 API costs tests with TPC-DS
  • Implications to the cloud data lake architecture

Speakers:

Koen Michiels holds a Master in Marketing Communication Sciences and an Advanced Master of Science – magna cum laude – in Marketing Analysis from Ghent University. He worked 7 years at dunnhumby in various roles including promotions, trade intelligence & shopper thoughts in the UK. Later he became the head of the solutions team for the CZ & SK market spearheading the innovation of cloud technologies, open source software development and interactive data visualizations. Koen has extensive experience in delivering insights at board level.


Juraj Pohanka leads the technical development. Covering application development, data engineering, and data science. He studied pure and applied mathematics at the Czech Technical University in Prague. Juraj’s past experience includes Deloitte – as a financial modeler – and Deutsche Boerse as a software developer. Juraj is passionate about modern technologies and mathematical models.

Bin Fan is the founding engineer and VP of Open Source at Alluxio, Inc. Prior to Alluxio, he worked for Google to build the next-generation storage infrastructure. Bin received his Ph.D. in Computer Science from Carnegie Mellon University on the design and implementation of distributed systems.

Questions? Slack with the speakers, users, and many other community members!
Welcome to join Alluxio Global Online Meetup Group to attend online meetups like this!

Video:

Slides: