structured data services Archives

Introducing Alluxio 2.3

July 1, 2020 By Zac Blanco and Adit Madan

Alluxio 2.3.0 focuses on streamlining the user experience in hybrid cloud deployments where Alluxio is deployed with compute in the cloud to access data on-prem. Features such as environment validation tools and concurrent metadata synchronization greatly improve Alluxio’s functionality. Integrations with AWS EMR, Google Dataproc, K8s, and AWS Glue make Alluxio easy to use in a variety of cloud environments. In this article, we will share some of the highlights of the release. For more, please visit our release notes page.

Everything you want to know about how to decouple SQL engines from Hive Data Warehouse

March 30, 2020 By Gene Pang

Are you using SQL engines, such as Presto, to query existing Hive data warehouse and experiencing challenges including overloaded Hive Metastore with slow and unpredictable access, unoptimized data formats and layouts such as too many small files, or lack of influence over the existing Hive system and other Hive applications?

Optimizing Query Performance by Decoupling Presto and Hive Data Warehouse

March 24, 2020

Ideally, Presto would access data independently from how the data was originally stored or managed. Alluxio, as a data orchestration layer provides the physical data independence, for Presto to interact with the data more efficiently. In addition to caching for IO acceleration, Alluxio also provides a catalog service to abstract the metadata in the Hive Metastore, and transformations to expose the data in compute-optimized way. In this talk, we describe some of the challenges of using Presto with Hive, and introduce Alluxio data orchestration for solving those challenges.

Tags: alluxio engineering, catalog service, data orchestration, hive, office hour, performance, presto, structured data services

What’s new in Alluxio 2.2

March 11, 2020 By Bin Fan, Gene Pang, Zac Blanco and Haoyuan Li

With this release comes the General Availability (GA) of Alluxio Structured Data Services (SDS), the subsystem of Alluxio responsible for managing and transforming structured data, such as databases, tables, and partitions.

Tag: structured data services