ALLUXIO COMMUNITY NEWSLETTER
The Data Orchestration Summit is less than two weeks away. It’s an amazing opportunity to learn about the key challenges and solutions for building modern data and AI platforms. Hear how others are tackling the toughest data engineering problems, discover interesting use cases, and compare different approaches and technologies. There are only a few seats remaining for the Presto and Alluxio hands-on lab that we are running with the creators of Presto. See schedule for more details, and save your seat now!
Data Orchestration: What Is it, Why Is it Important?
Read the Q&A explaining more on a data orchestration platform that brings your data closer to compute across clusters, regions, clouds, and countries.
Effective Analytical Pipelines on AWS Using EMR, Alluxio, and S3
This article shares learnings on moving a data pipeline originally running on a Hadoop cluster to AWS using EMR and S3.
Introducing Wormhole: Dockerized Presto & Alluxio setups for blazing fast analytics
This blog introduces Wormhole, an open source Dockerized solution for deploying Presto & Alluxio clusters for blazing fast analytics on file system.
This article describes how Baidu creates a secure, modular and extensible distributed file system service in project Pingo based on Alluxio. In this article, you will learn how to incorporate Alluxio to implement a unified distributed file system service as well as how to add extensions on top of Alluxio including customized authentication schemes and UDF (user-defined functions) on Alluxio files.
Learn how to run Spark jobs against on-premise storage or even a different cloud provider’s storage.
How to set up an EMR cluster with Alluxio as a distributed caching layer for Hive, and run sample queries to access data in S3 through Alluxio.
How to configure Alluxio with a single master in a cluster and use HDFS as under storage.
November 7 – Computer History Museum
Learn why DBS turned to Alluxio’s bursting approach to solve challenges with their data stack.
Hear about why the leading companies are moving towards a decoupled compute and storage architecture, along with the associated challenges and requirements.
This session is designed for data scientists or data engineers who work with remote and possibly multiple data sources in hybrid or multi-cloud environments. Learn how to use Alluxio to greatly simplify the data preparation in these environments.
Stop by booth #193 and win yourself a drone!
We hope to see you at the first Data Orchestration Summit, November 7th.