Enabling Ultra-fast Presto in the Cloud with Alluxio

This talk describes a stack of open-source projects to serve high-concurrent and low-latency SQL queries using Presto with Alluxio on big data in the cloud. Deploying Alluxio as a data orchestration layer to access cloud storage object storage (e.g., AWS S3), this architecture greatly enhances the data locality of Presto with distributed and cross-query caching, thus avoids reading the same data repeatedly from the cloud storage.

Tags: , , , , ,

Enabling Presto in the Cloud with Alluxio

Presto Summit New York *

The Presto Summit continues to bring together the best developers, engineers, data scientists, and executives from the Presto community to share how some of the largest and most innovative companies are using this technology to power their analytics platforms.

The Practice of Alluxio in Ctrip Real-Time Computing Platform

Today, real-time computation platform is becoming increasingly important in many organizations. In this article, we will describe how ctrip.com applies Alluxio to accelerate the Spark SQL real-time jobs and maintain the jobs’ consistency during the downtime of our internal data lake (HDFS). In addition, we leverage Alluxio as a caching layer to dramatically reduce the workload pressure on our HDFS NameNode.

Recap: Presto Summit SF 2019

Alluxio is a proud sponsor and exhibitor at the Presto Summit in San Francisco.
What’s Presto Summit? It’s the leading Presto conference co-organized by our partner Starburst Data and the Presto Software Foundation.

Building Fast SQL Analytics with Presto, Alluxio, and S3

Alluxio Community Office Hour *

Learn how to set up Presto with Alluxio such that Presto jobs can seamlessly read from and write to S3.
Compare the performance between Presto on S3 with Presto and Alluxio on S3.

Summertime themed In-Memory Computing extravaganza! (cross-post)

New York Meetup *

[Talk 1] A “how-to” presentation for building a real-time alerting, analytics and reporting system (at scale). With Denis Magda, vice president of the Apache Ignite PMC and director of product management at GridGain Systems. And Viktor Gamov, developer advocate at Confluent.
[Talk 2] Using In-Memory technology for real time analytics. With Andy Rivenes is a Product Manager at Oracle for Database In-Memory.
[Talk 3] Feeding data to the Kubernetes beast: bringing data locality to your containerized big data workloads. With Bin Fan, founding engineer of Alluxio, Inc. and PMC member of Alluxio open source project.

How do you partition Hive Table across storage systems using Alluxio?

Today when we create a Hive table, it is a common technique to partition the table across different values and ranges to improve query performance and reduce maintenance cost. However, Hive can not  access a single table directly using a single query with the data of this Hive table across different mediums of storage and … Continued